Introduction

Welcome to this training session on the programming language/software environment R! This document will have all of the code we are going to use in these sessions, which can be copied and pasted from here directly into your RStudio window to try them out.

Preliminaries

When you fire up RStudio, the window will look something like this:
RSTudio Window

Most of your work will take place in the code editor window. This is where you can write R scripts (a script is just a set of programming instructions), and save them to be run or edited later. When you run a script or part of a script, R inserts it into the console, which is where R processes code. You can also run code by entering it directly into the console. But this is mostly used for making sure a command does what you want it to, or quickly examining some of your data, as code entered here isn’t saved into one document the way code in your code editor is.

You can see what I mean with a simple R program. Enter the command below into your console.

print("Hello World")
## [1] "Hello World"

Congrats! You’ve written your first R program. Note that R does what you told it — it prints “hello world” in the console. You can also run this same line of code by writing it into your code editor, highlighting it, and hitting Ctrl - Enter if you are on a Windows or Linux machine, or Cmd - Enter if you are on a Mac. When you run code from the code editor like this, note that the output of the code still shows up down in the console.

The Social Ontology of R

Functions

There are two basic kinds of things that exist in the world of R: functions and objects. Functions look like this:

print()
table()
lm()

Functions tell R to do something. For example, this function tells R to output the square root of 49.

sqrt(49)

In this function, 49 is an argument. An argument is anything that goes into a function. In this case, sqrt() takes one argument — the number to find the square root of.

R is a recursive programming language. That means that the output of one function can be used as the input for another. For example, the function max() finds the maximum of a set of numbers. The function c() combines elements into a vector. These can be combined with sqrt() as follows:

sqrt(max(c(64,81,100)))

In principle, there is no limit to how many functions you can chain together like this. Obviously, if you have more than about three, while R will have no problem reading your code, you will be unable to read it after you walk away from it for more than ten minutes, and another person will find it largely illegible. Since, as two early computer scientists put it, “code is meant to be read by humans, and only incidentally interpreted by computers,” this can turn into a problem. Fortunately there’s an elegant solution in R which we will return to shortly.

Many functions take multiple arguments. For example, the function sample() is used to pick randomly from a set of elements. If I wanted to simulate rolling a twenty-sided die, I would use

sample(x=1:20, size=1)

In this case, the function sample is taking two arguments, x, which is the body of elements to sample from, and size, which is the number of times to sample it. R is “smart” at interpreting functions, in that you often don’t need to include the name of the argument, just what you want the argument to be. In the case of sample() above, this will produce identical output to the code above:

sample(1:20, 1)

Sometimes, you’ll want to include the argument names to keep things straight for yourself, and other times, you’ll know the functions well enough to do without them. To know which arguments a function takes, preface the function with a question mark. This will open up the help file for that function in the lower right quarter of the RStudio window.

?sample()

Take a minute now, and play around with the sample() function a little. See what works and what doesn’t. See if you can embed another function within your call of sample().

Objects

Objects are the other kind of thing in R. If something isn’t a function, it’s an object. When you use sample to simulate a twenty sided die, as above, the output of that function (let’s say it’s 3), is an object. 3 is also an object when you just type 3 into the console and hit enter. When you enter the name of an object into the console, it will output that object.

There are lots and lots of different types of objects in R. You can find out what type of object something is by using the class() function.

class(3)
## [1] "numeric"

3 is, unsurprisingly, a numeric object. The other basic type of object in R is a character string. Character strings are denoted by quotation marks around them.

class("three")
## [1] "character"

What do you think this will return?

class("3")

There are a lot of other kinds of classes in R, but most of them are basically different kinds of combinations of numeric elements or string elements.

As noted above, the function c() concatenates, or joins, elements into a vector. A vector is a one dimensional set of elements. You can have numeric vectors or string vectors.

c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
##  [1]  1  2  3  4  5  6  7  8  9 10
c("one", "two", "three", "four", "five", "six", "seven", "eight", "nine", "ten")
##  [1] "one"   "two"   "three" "four"  "five"  "six"   "seven" "eight" "nine" 
## [10] "ten"

In R, a vector has to have all elements of the same type. If you try to make a vector with elements of different types, R will try to force them all to be the same type.

c("one", 2, "three", 4)
## [1] "one"   "2"     "three" "4"
class(c("one", 2, "three", 4))
## [1] "character"

As you can see, a vector will have the class of its elements.

R can handle extremely long vectors. For example, we can use the colon operator we used above with the sample() function, which generates a sequence between the objects on either side of the colon, to generate a list of numbers from 1 to 10,000.

1:10000
##     [1]     1     2     3     4     5     6     7     8     9    10    11    12
##    [13]    13    14    15    16    17    18    19    20    21    22    23    24
##    [25]    25    26    27    28    29    30    31    32    33    34    35    36
##    [37]    37    38    39    40    41    42    43    44    45    46    47    48
##    [49]    49    50    51    52    53    54    55    56    57    58    59    60
##    [61]    61    62    63    64    65    66    67    68    69    70    71    72
##    [73]    73    74    75    76    77    78    79    80    81    82    83    84
##    [85]    85    86    87    88    89    90    91    92    93    94    95    96
##    [97]    97    98    99   100   101   102   103   104   105   106   107   108
##   [109]   109   110   111   112   113   114   115   116   117   118   119   120
##   [121]   121   122   123   124   125   126   127   128   129   130   131   132
##   [133]   133   134   135   136   137   138   139   140   141   142   143   144
##   [145]   145   146   147   148   149   150   151   152   153   154   155   156
##   [157]   157   158   159   160   161   162   163   164   165   166   167   168
##   [169]   169   170   171   172   173   174   175   176   177   178   179   180
##   [181]   181   182   183   184   185   186   187   188   189   190   191   192
##   [193]   193   194   195   196   197   198   199   200   201   202   203   204
##   [205]   205   206   207   208   209   210   211   212   213   214   215   216
##   [217]   217   218   219   220   221   222   223   224   225   226   227   228
##   [229]   229   230   231   232   233   234   235   236   237   238   239   240
##   [241]   241   242   243   244   245   246   247   248   249   250   251   252
##   [253]   253   254   255   256   257   258   259   260   261   262   263   264
##   [265]   265   266   267   268   269   270   271   272   273   274   275   276
##   [277]   277   278   279   280   281   282   283   284   285   286   287   288
##   [289]   289   290   291   292   293   294   295   296   297   298   299   300
##   [301]   301   302   303   304   305   306   307   308   309   310   311   312
##   [313]   313   314   315   316   317   318   319   320   321   322   323   324
##   [325]   325   326   327   328   329   330   331   332   333   334   335   336
##   [337]   337   338   339   340   341   342   343   344   345   346   347   348
##   [349]   349   350   351   352   353   354   355   356   357   358   359   360
##   [361]   361   362   363   364   365   366   367   368   369   370   371   372
##   [373]   373   374   375   376   377   378   379   380   381   382   383   384
##   [385]   385   386   387   388   389   390   391   392   393   394   395   396
##   [397]   397   398   399   400   401   402   403   404   405   406   407   408
##   [409]   409   410   411   412   413   414   415   416   417   418   419   420
##   [421]   421   422   423   424   425   426   427   428   429   430   431   432
##   [433]   433   434   435   436   437   438   439   440   441   442   443   444
##   [445]   445   446   447   448   449   450   451   452   453   454   455   456
##   [457]   457   458   459   460   461   462   463   464   465   466   467   468
##   [469]   469   470   471   472   473   474   475   476   477   478   479   480
##   [481]   481   482   483   484   485   486   487   488   489   490   491   492
##   [493]   493   494   495   496   497   498   499   500   501   502   503   504
##   [505]   505   506   507   508   509   510   511   512   513   514   515   516
##   [517]   517   518   519   520   521   522   523   524   525   526   527   528
##   [529]   529   530   531   532   533   534   535   536   537   538   539   540
##   [541]   541   542   543   544   545   546   547   548   549   550   551   552
##   [553]   553   554   555   556   557   558   559   560   561   562   563   564
##   [565]   565   566   567   568   569   570   571   572   573   574   575   576
##   [577]   577   578   579   580   581   582   583   584   585   586   587   588
##   [589]   589   590   591   592   593   594   595   596   597   598   599   600
##   [601]   601   602   603   604   605   606   607   608   609   610   611   612
##   [613]   613   614   615   616   617   618   619   620   621   622   623   624
##   [625]   625   626   627   628   629   630   631   632   633   634   635   636
##   [637]   637   638   639   640   641   642   643   644   645   646   647   648
##   [649]   649   650   651   652   653   654   655   656   657   658   659   660
##   [661]   661   662   663   664   665   666   667   668   669   670   671   672
##   [673]   673   674   675   676   677   678   679   680   681   682   683   684
##   [685]   685   686   687   688   689   690   691   692   693   694   695   696
##   [697]   697   698   699   700   701   702   703   704   705   706   707   708
##   [709]   709   710   711   712   713   714   715   716   717   718   719   720
##   [721]   721   722   723   724   725   726   727   728   729   730   731   732
##   [733]   733   734   735   736   737   738   739   740   741   742   743   744
##   [745]   745   746   747   748   749   750   751   752   753   754   755   756
##   [757]   757   758   759   760   761   762   763   764   765   766   767   768
##   [769]   769   770   771   772   773   774   775   776   777   778   779   780
##   [781]   781   782   783   784   785   786   787   788   789   790   791   792
##   [793]   793   794   795   796   797   798   799   800   801   802   803   804
##   [805]   805   806   807   808   809   810   811   812   813   814   815   816
##   [817]   817   818   819   820   821   822   823   824   825   826   827   828
##   [829]   829   830   831   832   833   834   835   836   837   838   839   840
##   [841]   841   842   843   844   845   846   847   848   849   850   851   852
##   [853]   853   854   855   856   857   858   859   860   861   862   863   864
##   [865]   865   866   867   868   869   870   871   872   873   874   875   876
##   [877]   877   878   879   880   881   882   883   884   885   886   887   888
##   [889]   889   890   891   892   893   894   895   896   897   898   899   900
##   [901]   901   902   903   904   905   906   907   908   909   910   911   912
##   [913]   913   914   915   916   917   918   919   920   921   922   923   924
##   [925]   925   926   927   928   929   930   931   932   933   934   935   936
##   [937]   937   938   939   940   941   942   943   944   945   946   947   948
##   [949]   949   950   951   952   953   954   955   956   957   958   959   960
##   [961]   961   962   963   964   965   966   967   968   969   970   971   972
##   [973]   973   974   975   976   977   978   979   980   981   982   983   984
##   [985]   985   986   987   988   989   990   991   992   993   994   995   996
##   [997]   997   998   999  1000  1001  1002  1003  1004  1005  1006  1007  1008
##  [1009]  1009  1010  1011  1012  1013  1014  1015  1016  1017  1018  1019  1020
##  [1021]  1021  1022  1023  1024  1025  1026  1027  1028  1029  1030  1031  1032
##  [1033]  1033  1034  1035  1036  1037  1038  1039  1040  1041  1042  1043  1044
##  [1045]  1045  1046  1047  1048  1049  1050  1051  1052  1053  1054  1055  1056
##  [1057]  1057  1058  1059  1060  1061  1062  1063  1064  1065  1066  1067  1068
##  [1069]  1069  1070  1071  1072  1073  1074  1075  1076  1077  1078  1079  1080
##  [1081]  1081  1082  1083  1084  1085  1086  1087  1088  1089  1090  1091  1092
##  [1093]  1093  1094  1095  1096  1097  1098  1099  1100  1101  1102  1103  1104
##  [1105]  1105  1106  1107  1108  1109  1110  1111  1112  1113  1114  1115  1116
##  [1117]  1117  1118  1119  1120  1121  1122  1123  1124  1125  1126  1127  1128
##  [1129]  1129  1130  1131  1132  1133  1134  1135  1136  1137  1138  1139  1140
##  [1141]  1141  1142  1143  1144  1145  1146  1147  1148  1149  1150  1151  1152
##  [1153]  1153  1154  1155  1156  1157  1158  1159  1160  1161  1162  1163  1164
##  [1165]  1165  1166  1167  1168  1169  1170  1171  1172  1173  1174  1175  1176
##  [1177]  1177  1178  1179  1180  1181  1182  1183  1184  1185  1186  1187  1188
##  [1189]  1189  1190  1191  1192  1193  1194  1195  1196  1197  1198  1199  1200
##  [1201]  1201  1202  1203  1204  1205  1206  1207  1208  1209  1210  1211  1212
##  [1213]  1213  1214  1215  1216  1217  1218  1219  1220  1221  1222  1223  1224
##  [1225]  1225  1226  1227  1228  1229  1230  1231  1232  1233  1234  1235  1236
##  [1237]  1237  1238  1239  1240  1241  1242  1243  1244  1245  1246  1247  1248
##  [1249]  1249  1250  1251  1252  1253  1254  1255  1256  1257  1258  1259  1260
##  [1261]  1261  1262  1263  1264  1265  1266  1267  1268  1269  1270  1271  1272
##  [1273]  1273  1274  1275  1276  1277  1278  1279  1280  1281  1282  1283  1284
##  [1285]  1285  1286  1287  1288  1289  1290  1291  1292  1293  1294  1295  1296
##  [1297]  1297  1298  1299  1300  1301  1302  1303  1304  1305  1306  1307  1308
##  [1309]  1309  1310  1311  1312  1313  1314  1315  1316  1317  1318  1319  1320
##  [1321]  1321  1322  1323  1324  1325  1326  1327  1328  1329  1330  1331  1332
##  [1333]  1333  1334  1335  1336  1337  1338  1339  1340  1341  1342  1343  1344
##  [1345]  1345  1346  1347  1348  1349  1350  1351  1352  1353  1354  1355  1356
##  [1357]  1357  1358  1359  1360  1361  1362  1363  1364  1365  1366  1367  1368
##  [1369]  1369  1370  1371  1372  1373  1374  1375  1376  1377  1378  1379  1380
##  [1381]  1381  1382  1383  1384  1385  1386  1387  1388  1389  1390  1391  1392
##  [1393]  1393  1394  1395  1396  1397  1398  1399  1400  1401  1402  1403  1404
##  [1405]  1405  1406  1407  1408  1409  1410  1411  1412  1413  1414  1415  1416
##  [1417]  1417  1418  1419  1420  1421  1422  1423  1424  1425  1426  1427  1428
##  [1429]  1429  1430  1431  1432  1433  1434  1435  1436  1437  1438  1439  1440
##  [1441]  1441  1442  1443  1444  1445  1446  1447  1448  1449  1450  1451  1452
##  [1453]  1453  1454  1455  1456  1457  1458  1459  1460  1461  1462  1463  1464
##  [1465]  1465  1466  1467  1468  1469  1470  1471  1472  1473  1474  1475  1476
##  [1477]  1477  1478  1479  1480  1481  1482  1483  1484  1485  1486  1487  1488
##  [1489]  1489  1490  1491  1492  1493  1494  1495  1496  1497  1498  1499  1500
##  [1501]  1501  1502  1503  1504  1505  1506  1507  1508  1509  1510  1511  1512
##  [1513]  1513  1514  1515  1516  1517  1518  1519  1520  1521  1522  1523  1524
##  [1525]  1525  1526  1527  1528  1529  1530  1531  1532  1533  1534  1535  1536
##  [1537]  1537  1538  1539  1540  1541  1542  1543  1544  1545  1546  1547  1548
##  [1549]  1549  1550  1551  1552  1553  1554  1555  1556  1557  1558  1559  1560
##  [1561]  1561  1562  1563  1564  1565  1566  1567  1568  1569  1570  1571  1572
##  [1573]  1573  1574  1575  1576  1577  1578  1579  1580  1581  1582  1583  1584
##  [1585]  1585  1586  1587  1588  1589  1590  1591  1592  1593  1594  1595  1596
##  [1597]  1597  1598  1599  1600  1601  1602  1603  1604  1605  1606  1607  1608
##  [1609]  1609  1610  1611  1612  1613  1614  1615  1616  1617  1618  1619  1620
##  [1621]  1621  1622  1623  1624  1625  1626  1627  1628  1629  1630  1631  1632
##  [1633]  1633  1634  1635  1636  1637  1638  1639  1640  1641  1642  1643  1644
##  [1645]  1645  1646  1647  1648  1649  1650  1651  1652  1653  1654  1655  1656
##  [1657]  1657  1658  1659  1660  1661  1662  1663  1664  1665  1666  1667  1668
##  [1669]  1669  1670  1671  1672  1673  1674  1675  1676  1677  1678  1679  1680
##  [1681]  1681  1682  1683  1684  1685  1686  1687  1688  1689  1690  1691  1692
##  [1693]  1693  1694  1695  1696  1697  1698  1699  1700  1701  1702  1703  1704
##  [1705]  1705  1706  1707  1708  1709  1710  1711  1712  1713  1714  1715  1716
##  [1717]  1717  1718  1719  1720  1721  1722  1723  1724  1725  1726  1727  1728
##  [1729]  1729  1730  1731  1732  1733  1734  1735  1736  1737  1738  1739  1740
##  [1741]  1741  1742  1743  1744  1745  1746  1747  1748  1749  1750  1751  1752
##  [1753]  1753  1754  1755  1756  1757  1758  1759  1760  1761  1762  1763  1764
##  [1765]  1765  1766  1767  1768  1769  1770  1771  1772  1773  1774  1775  1776
##  [1777]  1777  1778  1779  1780  1781  1782  1783  1784  1785  1786  1787  1788
##  [1789]  1789  1790  1791  1792  1793  1794  1795  1796  1797  1798  1799  1800
##  [1801]  1801  1802  1803  1804  1805  1806  1807  1808  1809  1810  1811  1812
##  [1813]  1813  1814  1815  1816  1817  1818  1819  1820  1821  1822  1823  1824
##  [1825]  1825  1826  1827  1828  1829  1830  1831  1832  1833  1834  1835  1836
##  [1837]  1837  1838  1839  1840  1841  1842  1843  1844  1845  1846  1847  1848
##  [1849]  1849  1850  1851  1852  1853  1854  1855  1856  1857  1858  1859  1860
##  [1861]  1861  1862  1863  1864  1865  1866  1867  1868  1869  1870  1871  1872
##  [1873]  1873  1874  1875  1876  1877  1878  1879  1880  1881  1882  1883  1884
##  [1885]  1885  1886  1887  1888  1889  1890  1891  1892  1893  1894  1895  1896
##  [1897]  1897  1898  1899  1900  1901  1902  1903  1904  1905  1906  1907  1908
##  [1909]  1909  1910  1911  1912  1913  1914  1915  1916  1917  1918  1919  1920
##  [1921]  1921  1922  1923  1924  1925  1926  1927  1928  1929  1930  1931  1932
##  [1933]  1933  1934  1935  1936  1937  1938  1939  1940  1941  1942  1943  1944
##  [1945]  1945  1946  1947  1948  1949  1950  1951  1952  1953  1954  1955  1956
##  [1957]  1957  1958  1959  1960  1961  1962  1963  1964  1965  1966  1967  1968
##  [1969]  1969  1970  1971  1972  1973  1974  1975  1976  1977  1978  1979  1980
##  [1981]  1981  1982  1983  1984  1985  1986  1987  1988  1989  1990  1991  1992
##  [1993]  1993  1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004
##  [2005]  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016
##  [2017]  2017  2018  2019  2020  2021  2022  2023  2024  2025  2026  2027  2028
##  [2029]  2029  2030  2031  2032  2033  2034  2035  2036  2037  2038  2039  2040
##  [2041]  2041  2042  2043  2044  2045  2046  2047  2048  2049  2050  2051  2052
##  [2053]  2053  2054  2055  2056  2057  2058  2059  2060  2061  2062  2063  2064
##  [2065]  2065  2066  2067  2068  2069  2070  2071  2072  2073  2074  2075  2076
##  [2077]  2077  2078  2079  2080  2081  2082  2083  2084  2085  2086  2087  2088
##  [2089]  2089  2090  2091  2092  2093  2094  2095  2096  2097  2098  2099  2100
##  [2101]  2101  2102  2103  2104  2105  2106  2107  2108  2109  2110  2111  2112
##  [2113]  2113  2114  2115  2116  2117  2118  2119  2120  2121  2122  2123  2124
##  [2125]  2125  2126  2127  2128  2129  2130  2131  2132  2133  2134  2135  2136
##  [2137]  2137  2138  2139  2140  2141  2142  2143  2144  2145  2146  2147  2148
##  [2149]  2149  2150  2151  2152  2153  2154  2155  2156  2157  2158  2159  2160
##  [2161]  2161  2162  2163  2164  2165  2166  2167  2168  2169  2170  2171  2172
##  [2173]  2173  2174  2175  2176  2177  2178  2179  2180  2181  2182  2183  2184
##  [2185]  2185  2186  2187  2188  2189  2190  2191  2192  2193  2194  2195  2196
##  [2197]  2197  2198  2199  2200  2201  2202  2203  2204  2205  2206  2207  2208
##  [2209]  2209  2210  2211  2212  2213  2214  2215  2216  2217  2218  2219  2220
##  [2221]  2221  2222  2223  2224  2225  2226  2227  2228  2229  2230  2231  2232
##  [2233]  2233  2234  2235  2236  2237  2238  2239  2240  2241  2242  2243  2244
##  [2245]  2245  2246  2247  2248  2249  2250  2251  2252  2253  2254  2255  2256
##  [2257]  2257  2258  2259  2260  2261  2262  2263  2264  2265  2266  2267  2268
##  [2269]  2269  2270  2271  2272  2273  2274  2275  2276  2277  2278  2279  2280
##  [2281]  2281  2282  2283  2284  2285  2286  2287  2288  2289  2290  2291  2292
##  [2293]  2293  2294  2295  2296  2297  2298  2299  2300  2301  2302  2303  2304
##  [2305]  2305  2306  2307  2308  2309  2310  2311  2312  2313  2314  2315  2316
##  [2317]  2317  2318  2319  2320  2321  2322  2323  2324  2325  2326  2327  2328
##  [2329]  2329  2330  2331  2332  2333  2334  2335  2336  2337  2338  2339  2340
##  [2341]  2341  2342  2343  2344  2345  2346  2347  2348  2349  2350  2351  2352
##  [2353]  2353  2354  2355  2356  2357  2358  2359  2360  2361  2362  2363  2364
##  [2365]  2365  2366  2367  2368  2369  2370  2371  2372  2373  2374  2375  2376
##  [2377]  2377  2378  2379  2380  2381  2382  2383  2384  2385  2386  2387  2388
##  [2389]  2389  2390  2391  2392  2393  2394  2395  2396  2397  2398  2399  2400
##  [2401]  2401  2402  2403  2404  2405  2406  2407  2408  2409  2410  2411  2412
##  [2413]  2413  2414  2415  2416  2417  2418  2419  2420  2421  2422  2423  2424
##  [2425]  2425  2426  2427  2428  2429  2430  2431  2432  2433  2434  2435  2436
##  [2437]  2437  2438  2439  2440  2441  2442  2443  2444  2445  2446  2447  2448
##  [2449]  2449  2450  2451  2452  2453  2454  2455  2456  2457  2458  2459  2460
##  [2461]  2461  2462  2463  2464  2465  2466  2467  2468  2469  2470  2471  2472
##  [2473]  2473  2474  2475  2476  2477  2478  2479  2480  2481  2482  2483  2484
##  [2485]  2485  2486  2487  2488  2489  2490  2491  2492  2493  2494  2495  2496
##  [2497]  2497  2498  2499  2500  2501  2502  2503  2504  2505  2506  2507  2508
##  [2509]  2509  2510  2511  2512  2513  2514  2515  2516  2517  2518  2519  2520
##  [2521]  2521  2522  2523  2524  2525  2526  2527  2528  2529  2530  2531  2532
##  [2533]  2533  2534  2535  2536  2537  2538  2539  2540  2541  2542  2543  2544
##  [2545]  2545  2546  2547  2548  2549  2550  2551  2552  2553  2554  2555  2556
##  [2557]  2557  2558  2559  2560  2561  2562  2563  2564  2565  2566  2567  2568
##  [2569]  2569  2570  2571  2572  2573  2574  2575  2576  2577  2578  2579  2580
##  [2581]  2581  2582  2583  2584  2585  2586  2587  2588  2589  2590  2591  2592
##  [2593]  2593  2594  2595  2596  2597  2598  2599  2600  2601  2602  2603  2604
##  [2605]  2605  2606  2607  2608  2609  2610  2611  2612  2613  2614  2615  2616
##  [2617]  2617  2618  2619  2620  2621  2622  2623  2624  2625  2626  2627  2628
##  [2629]  2629  2630  2631  2632  2633  2634  2635  2636  2637  2638  2639  2640
##  [2641]  2641  2642  2643  2644  2645  2646  2647  2648  2649  2650  2651  2652
##  [2653]  2653  2654  2655  2656  2657  2658  2659  2660  2661  2662  2663  2664
##  [2665]  2665  2666  2667  2668  2669  2670  2671  2672  2673  2674  2675  2676
##  [2677]  2677  2678  2679  2680  2681  2682  2683  2684  2685  2686  2687  2688
##  [2689]  2689  2690  2691  2692  2693  2694  2695  2696  2697  2698  2699  2700
##  [2701]  2701  2702  2703  2704  2705  2706  2707  2708  2709  2710  2711  2712
##  [2713]  2713  2714  2715  2716  2717  2718  2719  2720  2721  2722  2723  2724
##  [2725]  2725  2726  2727  2728  2729  2730  2731  2732  2733  2734  2735  2736
##  [2737]  2737  2738  2739  2740  2741  2742  2743  2744  2745  2746  2747  2748
##  [2749]  2749  2750  2751  2752  2753  2754  2755  2756  2757  2758  2759  2760
##  [2761]  2761  2762  2763  2764  2765  2766  2767  2768  2769  2770  2771  2772
##  [2773]  2773  2774  2775  2776  2777  2778  2779  2780  2781  2782  2783  2784
##  [2785]  2785  2786  2787  2788  2789  2790  2791  2792  2793  2794  2795  2796
##  [2797]  2797  2798  2799  2800  2801  2802  2803  2804  2805  2806  2807  2808
##  [2809]  2809  2810  2811  2812  2813  2814  2815  2816  2817  2818  2819  2820
##  [2821]  2821  2822  2823  2824  2825  2826  2827  2828  2829  2830  2831  2832
##  [2833]  2833  2834  2835  2836  2837  2838  2839  2840  2841  2842  2843  2844
##  [2845]  2845  2846  2847  2848  2849  2850  2851  2852  2853  2854  2855  2856
##  [2857]  2857  2858  2859  2860  2861  2862  2863  2864  2865  2866  2867  2868
##  [2869]  2869  2870  2871  2872  2873  2874  2875  2876  2877  2878  2879  2880
##  [2881]  2881  2882  2883  2884  2885  2886  2887  2888  2889  2890  2891  2892
##  [2893]  2893  2894  2895  2896  2897  2898  2899  2900  2901  2902  2903  2904
##  [2905]  2905  2906  2907  2908  2909  2910  2911  2912  2913  2914  2915  2916
##  [2917]  2917  2918  2919  2920  2921  2922  2923  2924  2925  2926  2927  2928
##  [2929]  2929  2930  2931  2932  2933  2934  2935  2936  2937  2938  2939  2940
##  [2941]  2941  2942  2943  2944  2945  2946  2947  2948  2949  2950  2951  2952
##  [2953]  2953  2954  2955  2956  2957  2958  2959  2960  2961  2962  2963  2964
##  [2965]  2965  2966  2967  2968  2969  2970  2971  2972  2973  2974  2975  2976
##  [2977]  2977  2978  2979  2980  2981  2982  2983  2984  2985  2986  2987  2988
##  [2989]  2989  2990  2991  2992  2993  2994  2995  2996  2997  2998  2999  3000
##  [3001]  3001  3002  3003  3004  3005  3006  3007  3008  3009  3010  3011  3012
##  [3013]  3013  3014  3015  3016  3017  3018  3019  3020  3021  3022  3023  3024
##  [3025]  3025  3026  3027  3028  3029  3030  3031  3032  3033  3034  3035  3036
##  [3037]  3037  3038  3039  3040  3041  3042  3043  3044  3045  3046  3047  3048
##  [3049]  3049  3050  3051  3052  3053  3054  3055  3056  3057  3058  3059  3060
##  [3061]  3061  3062  3063  3064  3065  3066  3067  3068  3069  3070  3071  3072
##  [3073]  3073  3074  3075  3076  3077  3078  3079  3080  3081  3082  3083  3084
##  [3085]  3085  3086  3087  3088  3089  3090  3091  3092  3093  3094  3095  3096
##  [3097]  3097  3098  3099  3100  3101  3102  3103  3104  3105  3106  3107  3108
##  [3109]  3109  3110  3111  3112  3113  3114  3115  3116  3117  3118  3119  3120
##  [3121]  3121  3122  3123  3124  3125  3126  3127  3128  3129  3130  3131  3132
##  [3133]  3133  3134  3135  3136  3137  3138  3139  3140  3141  3142  3143  3144
##  [3145]  3145  3146  3147  3148  3149  3150  3151  3152  3153  3154  3155  3156
##  [3157]  3157  3158  3159  3160  3161  3162  3163  3164  3165  3166  3167  3168
##  [3169]  3169  3170  3171  3172  3173  3174  3175  3176  3177  3178  3179  3180
##  [3181]  3181  3182  3183  3184  3185  3186  3187  3188  3189  3190  3191  3192
##  [3193]  3193  3194  3195  3196  3197  3198  3199  3200  3201  3202  3203  3204
##  [3205]  3205  3206  3207  3208  3209  3210  3211  3212  3213  3214  3215  3216
##  [3217]  3217  3218  3219  3220  3221  3222  3223  3224  3225  3226  3227  3228
##  [3229]  3229  3230  3231  3232  3233  3234  3235  3236  3237  3238  3239  3240
##  [3241]  3241  3242  3243  3244  3245  3246  3247  3248  3249  3250  3251  3252
##  [3253]  3253  3254  3255  3256  3257  3258  3259  3260  3261  3262  3263  3264
##  [3265]  3265  3266  3267  3268  3269  3270  3271  3272  3273  3274  3275  3276
##  [3277]  3277  3278  3279  3280  3281  3282  3283  3284  3285  3286  3287  3288
##  [3289]  3289  3290  3291  3292  3293  3294  3295  3296  3297  3298  3299  3300
##  [3301]  3301  3302  3303  3304  3305  3306  3307  3308  3309  3310  3311  3312
##  [3313]  3313  3314  3315  3316  3317  3318  3319  3320  3321  3322  3323  3324
##  [3325]  3325  3326  3327  3328  3329  3330  3331  3332  3333  3334  3335  3336
##  [3337]  3337  3338  3339  3340  3341  3342  3343  3344  3345  3346  3347  3348
##  [3349]  3349  3350  3351  3352  3353  3354  3355  3356  3357  3358  3359  3360
##  [3361]  3361  3362  3363  3364  3365  3366  3367  3368  3369  3370  3371  3372
##  [3373]  3373  3374  3375  3376  3377  3378  3379  3380  3381  3382  3383  3384
##  [3385]  3385  3386  3387  3388  3389  3390  3391  3392  3393  3394  3395  3396
##  [3397]  3397  3398  3399  3400  3401  3402  3403  3404  3405  3406  3407  3408
##  [3409]  3409  3410  3411  3412  3413  3414  3415  3416  3417  3418  3419  3420
##  [3421]  3421  3422  3423  3424  3425  3426  3427  3428  3429  3430  3431  3432
##  [3433]  3433  3434  3435  3436  3437  3438  3439  3440  3441  3442  3443  3444
##  [3445]  3445  3446  3447  3448  3449  3450  3451  3452  3453  3454  3455  3456
##  [3457]  3457  3458  3459  3460  3461  3462  3463  3464  3465  3466  3467  3468
##  [3469]  3469  3470  3471  3472  3473  3474  3475  3476  3477  3478  3479  3480
##  [3481]  3481  3482  3483  3484  3485  3486  3487  3488  3489  3490  3491  3492
##  [3493]  3493  3494  3495  3496  3497  3498  3499  3500  3501  3502  3503  3504
##  [3505]  3505  3506  3507  3508  3509  3510  3511  3512  3513  3514  3515  3516
##  [3517]  3517  3518  3519  3520  3521  3522  3523  3524  3525  3526  3527  3528
##  [3529]  3529  3530  3531  3532  3533  3534  3535  3536  3537  3538  3539  3540
##  [3541]  3541  3542  3543  3544  3545  3546  3547  3548  3549  3550  3551  3552
##  [3553]  3553  3554  3555  3556  3557  3558  3559  3560  3561  3562  3563  3564
##  [3565]  3565  3566  3567  3568  3569  3570  3571  3572  3573  3574  3575  3576
##  [3577]  3577  3578  3579  3580  3581  3582  3583  3584  3585  3586  3587  3588
##  [3589]  3589  3590  3591  3592  3593  3594  3595  3596  3597  3598  3599  3600
##  [3601]  3601  3602  3603  3604  3605  3606  3607  3608  3609  3610  3611  3612
##  [3613]  3613  3614  3615  3616  3617  3618  3619  3620  3621  3622  3623  3624
##  [3625]  3625  3626  3627  3628  3629  3630  3631  3632  3633  3634  3635  3636
##  [3637]  3637  3638  3639  3640  3641  3642  3643  3644  3645  3646  3647  3648
##  [3649]  3649  3650  3651  3652  3653  3654  3655  3656  3657  3658  3659  3660
##  [3661]  3661  3662  3663  3664  3665  3666  3667  3668  3669  3670  3671  3672
##  [3673]  3673  3674  3675  3676  3677  3678  3679  3680  3681  3682  3683  3684
##  [3685]  3685  3686  3687  3688  3689  3690  3691  3692  3693  3694  3695  3696
##  [3697]  3697  3698  3699  3700  3701  3702  3703  3704  3705  3706  3707  3708
##  [3709]  3709  3710  3711  3712  3713  3714  3715  3716  3717  3718  3719  3720
##  [3721]  3721  3722  3723  3724  3725  3726  3727  3728  3729  3730  3731  3732
##  [3733]  3733  3734  3735  3736  3737  3738  3739  3740  3741  3742  3743  3744
##  [3745]  3745  3746  3747  3748  3749  3750  3751  3752  3753  3754  3755  3756
##  [3757]  3757  3758  3759  3760  3761  3762  3763  3764  3765  3766  3767  3768
##  [3769]  3769  3770  3771  3772  3773  3774  3775  3776  3777  3778  3779  3780
##  [3781]  3781  3782  3783  3784  3785  3786  3787  3788  3789  3790  3791  3792
##  [3793]  3793  3794  3795  3796  3797  3798  3799  3800  3801  3802  3803  3804
##  [3805]  3805  3806  3807  3808  3809  3810  3811  3812  3813  3814  3815  3816
##  [3817]  3817  3818  3819  3820  3821  3822  3823  3824  3825  3826  3827  3828
##  [3829]  3829  3830  3831  3832  3833  3834  3835  3836  3837  3838  3839  3840
##  [3841]  3841  3842  3843  3844  3845  3846  3847  3848  3849  3850  3851  3852
##  [3853]  3853  3854  3855  3856  3857  3858  3859  3860  3861  3862  3863  3864
##  [3865]  3865  3866  3867  3868  3869  3870  3871  3872  3873  3874  3875  3876
##  [3877]  3877  3878  3879  3880  3881  3882  3883  3884  3885  3886  3887  3888
##  [3889]  3889  3890  3891  3892  3893  3894  3895  3896  3897  3898  3899  3900
##  [3901]  3901  3902  3903  3904  3905  3906  3907  3908  3909  3910  3911  3912
##  [3913]  3913  3914  3915  3916  3917  3918  3919  3920  3921  3922  3923  3924
##  [3925]  3925  3926  3927  3928  3929  3930  3931  3932  3933  3934  3935  3936
##  [3937]  3937  3938  3939  3940  3941  3942  3943  3944  3945  3946  3947  3948
##  [3949]  3949  3950  3951  3952  3953  3954  3955  3956  3957  3958  3959  3960
##  [3961]  3961  3962  3963  3964  3965  3966  3967  3968  3969  3970  3971  3972
##  [3973]  3973  3974  3975  3976  3977  3978  3979  3980  3981  3982  3983  3984
##  [3985]  3985  3986  3987  3988  3989  3990  3991  3992  3993  3994  3995  3996
##  [3997]  3997  3998  3999  4000  4001  4002  4003  4004  4005  4006  4007  4008
##  [4009]  4009  4010  4011  4012  4013  4014  4015  4016  4017  4018  4019  4020
##  [4021]  4021  4022  4023  4024  4025  4026  4027  4028  4029  4030  4031  4032
##  [4033]  4033  4034  4035  4036  4037  4038  4039  4040  4041  4042  4043  4044
##  [4045]  4045  4046  4047  4048  4049  4050  4051  4052  4053  4054  4055  4056
##  [4057]  4057  4058  4059  4060  4061  4062  4063  4064  4065  4066  4067  4068
##  [4069]  4069  4070  4071  4072  4073  4074  4075  4076  4077  4078  4079  4080
##  [4081]  4081  4082  4083  4084  4085  4086  4087  4088  4089  4090  4091  4092
##  [4093]  4093  4094  4095  4096  4097  4098  4099  4100  4101  4102  4103  4104
##  [4105]  4105  4106  4107  4108  4109  4110  4111  4112  4113  4114  4115  4116
##  [4117]  4117  4118  4119  4120  4121  4122  4123  4124  4125  4126  4127  4128
##  [4129]  4129  4130  4131  4132  4133  4134  4135  4136  4137  4138  4139  4140
##  [4141]  4141  4142  4143  4144  4145  4146  4147  4148  4149  4150  4151  4152
##  [4153]  4153  4154  4155  4156  4157  4158  4159  4160  4161  4162  4163  4164
##  [4165]  4165  4166  4167  4168  4169  4170  4171  4172  4173  4174  4175  4176
##  [4177]  4177  4178  4179  4180  4181  4182  4183  4184  4185  4186  4187  4188
##  [4189]  4189  4190  4191  4192  4193  4194  4195  4196  4197  4198  4199  4200
##  [4201]  4201  4202  4203  4204  4205  4206  4207  4208  4209  4210  4211  4212
##  [4213]  4213  4214  4215  4216  4217  4218  4219  4220  4221  4222  4223  4224
##  [4225]  4225  4226  4227  4228  4229  4230  4231  4232  4233  4234  4235  4236
##  [4237]  4237  4238  4239  4240  4241  4242  4243  4244  4245  4246  4247  4248
##  [4249]  4249  4250  4251  4252  4253  4254  4255  4256  4257  4258  4259  4260
##  [4261]  4261  4262  4263  4264  4265  4266  4267  4268  4269  4270  4271  4272
##  [4273]  4273  4274  4275  4276  4277  4278  4279  4280  4281  4282  4283  4284
##  [4285]  4285  4286  4287  4288  4289  4290  4291  4292  4293  4294  4295  4296
##  [4297]  4297  4298  4299  4300  4301  4302  4303  4304  4305  4306  4307  4308
##  [4309]  4309  4310  4311  4312  4313  4314  4315  4316  4317  4318  4319  4320
##  [4321]  4321  4322  4323  4324  4325  4326  4327  4328  4329  4330  4331  4332
##  [4333]  4333  4334  4335  4336  4337  4338  4339  4340  4341  4342  4343  4344
##  [4345]  4345  4346  4347  4348  4349  4350  4351  4352  4353  4354  4355  4356
##  [4357]  4357  4358  4359  4360  4361  4362  4363  4364  4365  4366  4367  4368
##  [4369]  4369  4370  4371  4372  4373  4374  4375  4376  4377  4378  4379  4380
##  [4381]  4381  4382  4383  4384  4385  4386  4387  4388  4389  4390  4391  4392
##  [4393]  4393  4394  4395  4396  4397  4398  4399  4400  4401  4402  4403  4404
##  [4405]  4405  4406  4407  4408  4409  4410  4411  4412  4413  4414  4415  4416
##  [4417]  4417  4418  4419  4420  4421  4422  4423  4424  4425  4426  4427  4428
##  [4429]  4429  4430  4431  4432  4433  4434  4435  4436  4437  4438  4439  4440
##  [4441]  4441  4442  4443  4444  4445  4446  4447  4448  4449  4450  4451  4452
##  [4453]  4453  4454  4455  4456  4457  4458  4459  4460  4461  4462  4463  4464
##  [4465]  4465  4466  4467  4468  4469  4470  4471  4472  4473  4474  4475  4476
##  [4477]  4477  4478  4479  4480  4481  4482  4483  4484  4485  4486  4487  4488
##  [4489]  4489  4490  4491  4492  4493  4494  4495  4496  4497  4498  4499  4500
##  [4501]  4501  4502  4503  4504  4505  4506  4507  4508  4509  4510  4511  4512
##  [4513]  4513  4514  4515  4516  4517  4518  4519  4520  4521  4522  4523  4524
##  [4525]  4525  4526  4527  4528  4529  4530  4531  4532  4533  4534  4535  4536
##  [4537]  4537  4538  4539  4540  4541  4542  4543  4544  4545  4546  4547  4548
##  [4549]  4549  4550  4551  4552  4553  4554  4555  4556  4557  4558  4559  4560
##  [4561]  4561  4562  4563  4564  4565  4566  4567  4568  4569  4570  4571  4572
##  [4573]  4573  4574  4575  4576  4577  4578  4579  4580  4581  4582  4583  4584
##  [4585]  4585  4586  4587  4588  4589  4590  4591  4592  4593  4594  4595  4596
##  [4597]  4597  4598  4599  4600  4601  4602  4603  4604  4605  4606  4607  4608
##  [4609]  4609  4610  4611  4612  4613  4614  4615  4616  4617  4618  4619  4620
##  [4621]  4621  4622  4623  4624  4625  4626  4627  4628  4629  4630  4631  4632
##  [4633]  4633  4634  4635  4636  4637  4638  4639  4640  4641  4642  4643  4644
##  [4645]  4645  4646  4647  4648  4649  4650  4651  4652  4653  4654  4655  4656
##  [4657]  4657  4658  4659  4660  4661  4662  4663  4664  4665  4666  4667  4668
##  [4669]  4669  4670  4671  4672  4673  4674  4675  4676  4677  4678  4679  4680
##  [4681]  4681  4682  4683  4684  4685  4686  4687  4688  4689  4690  4691  4692
##  [4693]  4693  4694  4695  4696  4697  4698  4699  4700  4701  4702  4703  4704
##  [4705]  4705  4706  4707  4708  4709  4710  4711  4712  4713  4714  4715  4716
##  [4717]  4717  4718  4719  4720  4721  4722  4723  4724  4725  4726  4727  4728
##  [4729]  4729  4730  4731  4732  4733  4734  4735  4736  4737  4738  4739  4740
##  [4741]  4741  4742  4743  4744  4745  4746  4747  4748  4749  4750  4751  4752
##  [4753]  4753  4754  4755  4756  4757  4758  4759  4760  4761  4762  4763  4764
##  [4765]  4765  4766  4767  4768  4769  4770  4771  4772  4773  4774  4775  4776
##  [4777]  4777  4778  4779  4780  4781  4782  4783  4784  4785  4786  4787  4788
##  [4789]  4789  4790  4791  4792  4793  4794  4795  4796  4797  4798  4799  4800
##  [4801]  4801  4802  4803  4804  4805  4806  4807  4808  4809  4810  4811  4812
##  [4813]  4813  4814  4815  4816  4817  4818  4819  4820  4821  4822  4823  4824
##  [4825]  4825  4826  4827  4828  4829  4830  4831  4832  4833  4834  4835  4836
##  [4837]  4837  4838  4839  4840  4841  4842  4843  4844  4845  4846  4847  4848
##  [4849]  4849  4850  4851  4852  4853  4854  4855  4856  4857  4858  4859  4860
##  [4861]  4861  4862  4863  4864  4865  4866  4867  4868  4869  4870  4871  4872
##  [4873]  4873  4874  4875  4876  4877  4878  4879  4880  4881  4882  4883  4884
##  [4885]  4885  4886  4887  4888  4889  4890  4891  4892  4893  4894  4895  4896
##  [4897]  4897  4898  4899  4900  4901  4902  4903  4904  4905  4906  4907  4908
##  [4909]  4909  4910  4911  4912  4913  4914  4915  4916  4917  4918  4919  4920
##  [4921]  4921  4922  4923  4924  4925  4926  4927  4928  4929  4930  4931  4932
##  [4933]  4933  4934  4935  4936  4937  4938  4939  4940  4941  4942  4943  4944
##  [4945]  4945  4946  4947  4948  4949  4950  4951  4952  4953  4954  4955  4956
##  [4957]  4957  4958  4959  4960  4961  4962  4963  4964  4965  4966  4967  4968
##  [4969]  4969  4970  4971  4972  4973  4974  4975  4976  4977  4978  4979  4980
##  [4981]  4981  4982  4983  4984  4985  4986  4987  4988  4989  4990  4991  4992
##  [4993]  4993  4994  4995  4996  4997  4998  4999  5000  5001  5002  5003  5004
##  [5005]  5005  5006  5007  5008  5009  5010  5011  5012  5013  5014  5015  5016
##  [5017]  5017  5018  5019  5020  5021  5022  5023  5024  5025  5026  5027  5028
##  [5029]  5029  5030  5031  5032  5033  5034  5035  5036  5037  5038  5039  5040
##  [5041]  5041  5042  5043  5044  5045  5046  5047  5048  5049  5050  5051  5052
##  [5053]  5053  5054  5055  5056  5057  5058  5059  5060  5061  5062  5063  5064
##  [5065]  5065  5066  5067  5068  5069  5070  5071  5072  5073  5074  5075  5076
##  [5077]  5077  5078  5079  5080  5081  5082  5083  5084  5085  5086  5087  5088
##  [5089]  5089  5090  5091  5092  5093  5094  5095  5096  5097  5098  5099  5100
##  [5101]  5101  5102  5103  5104  5105  5106  5107  5108  5109  5110  5111  5112
##  [5113]  5113  5114  5115  5116  5117  5118  5119  5120  5121  5122  5123  5124
##  [5125]  5125  5126  5127  5128  5129  5130  5131  5132  5133  5134  5135  5136
##  [5137]  5137  5138  5139  5140  5141  5142  5143  5144  5145  5146  5147  5148
##  [5149]  5149  5150  5151  5152  5153  5154  5155  5156  5157  5158  5159  5160
##  [5161]  5161  5162  5163  5164  5165  5166  5167  5168  5169  5170  5171  5172
##  [5173]  5173  5174  5175  5176  5177  5178  5179  5180  5181  5182  5183  5184
##  [5185]  5185  5186  5187  5188  5189  5190  5191  5192  5193  5194  5195  5196
##  [5197]  5197  5198  5199  5200  5201  5202  5203  5204  5205  5206  5207  5208
##  [5209]  5209  5210  5211  5212  5213  5214  5215  5216  5217  5218  5219  5220
##  [5221]  5221  5222  5223  5224  5225  5226  5227  5228  5229  5230  5231  5232
##  [5233]  5233  5234  5235  5236  5237  5238  5239  5240  5241  5242  5243  5244
##  [5245]  5245  5246  5247  5248  5249  5250  5251  5252  5253  5254  5255  5256
##  [5257]  5257  5258  5259  5260  5261  5262  5263  5264  5265  5266  5267  5268
##  [5269]  5269  5270  5271  5272  5273  5274  5275  5276  5277  5278  5279  5280
##  [5281]  5281  5282  5283  5284  5285  5286  5287  5288  5289  5290  5291  5292
##  [5293]  5293  5294  5295  5296  5297  5298  5299  5300  5301  5302  5303  5304
##  [5305]  5305  5306  5307  5308  5309  5310  5311  5312  5313  5314  5315  5316
##  [5317]  5317  5318  5319  5320  5321  5322  5323  5324  5325  5326  5327  5328
##  [5329]  5329  5330  5331  5332  5333  5334  5335  5336  5337  5338  5339  5340
##  [5341]  5341  5342  5343  5344  5345  5346  5347  5348  5349  5350  5351  5352
##  [5353]  5353  5354  5355  5356  5357  5358  5359  5360  5361  5362  5363  5364
##  [5365]  5365  5366  5367  5368  5369  5370  5371  5372  5373  5374  5375  5376
##  [5377]  5377  5378  5379  5380  5381  5382  5383  5384  5385  5386  5387  5388
##  [5389]  5389  5390  5391  5392  5393  5394  5395  5396  5397  5398  5399  5400
##  [5401]  5401  5402  5403  5404  5405  5406  5407  5408  5409  5410  5411  5412
##  [5413]  5413  5414  5415  5416  5417  5418  5419  5420  5421  5422  5423  5424
##  [5425]  5425  5426  5427  5428  5429  5430  5431  5432  5433  5434  5435  5436
##  [5437]  5437  5438  5439  5440  5441  5442  5443  5444  5445  5446  5447  5448
##  [5449]  5449  5450  5451  5452  5453  5454  5455  5456  5457  5458  5459  5460
##  [5461]  5461  5462  5463  5464  5465  5466  5467  5468  5469  5470  5471  5472
##  [5473]  5473  5474  5475  5476  5477  5478  5479  5480  5481  5482  5483  5484
##  [5485]  5485  5486  5487  5488  5489  5490  5491  5492  5493  5494  5495  5496
##  [5497]  5497  5498  5499  5500  5501  5502  5503  5504  5505  5506  5507  5508
##  [5509]  5509  5510  5511  5512  5513  5514  5515  5516  5517  5518  5519  5520
##  [5521]  5521  5522  5523  5524  5525  5526  5527  5528  5529  5530  5531  5532
##  [5533]  5533  5534  5535  5536  5537  5538  5539  5540  5541  5542  5543  5544
##  [5545]  5545  5546  5547  5548  5549  5550  5551  5552  5553  5554  5555  5556
##  [5557]  5557  5558  5559  5560  5561  5562  5563  5564  5565  5566  5567  5568
##  [5569]  5569  5570  5571  5572  5573  5574  5575  5576  5577  5578  5579  5580
##  [5581]  5581  5582  5583  5584  5585  5586  5587  5588  5589  5590  5591  5592
##  [5593]  5593  5594  5595  5596  5597  5598  5599  5600  5601  5602  5603  5604
##  [5605]  5605  5606  5607  5608  5609  5610  5611  5612  5613  5614  5615  5616
##  [5617]  5617  5618  5619  5620  5621  5622  5623  5624  5625  5626  5627  5628
##  [5629]  5629  5630  5631  5632  5633  5634  5635  5636  5637  5638  5639  5640
##  [5641]  5641  5642  5643  5644  5645  5646  5647  5648  5649  5650  5651  5652
##  [5653]  5653  5654  5655  5656  5657  5658  5659  5660  5661  5662  5663  5664
##  [5665]  5665  5666  5667  5668  5669  5670  5671  5672  5673  5674  5675  5676
##  [5677]  5677  5678  5679  5680  5681  5682  5683  5684  5685  5686  5687  5688
##  [5689]  5689  5690  5691  5692  5693  5694  5695  5696  5697  5698  5699  5700
##  [5701]  5701  5702  5703  5704  5705  5706  5707  5708  5709  5710  5711  5712
##  [5713]  5713  5714  5715  5716  5717  5718  5719  5720  5721  5722  5723  5724
##  [5725]  5725  5726  5727  5728  5729  5730  5731  5732  5733  5734  5735  5736
##  [5737]  5737  5738  5739  5740  5741  5742  5743  5744  5745  5746  5747  5748
##  [5749]  5749  5750  5751  5752  5753  5754  5755  5756  5757  5758  5759  5760
##  [5761]  5761  5762  5763  5764  5765  5766  5767  5768  5769  5770  5771  5772
##  [5773]  5773  5774  5775  5776  5777  5778  5779  5780  5781  5782  5783  5784
##  [5785]  5785  5786  5787  5788  5789  5790  5791  5792  5793  5794  5795  5796
##  [5797]  5797  5798  5799  5800  5801  5802  5803  5804  5805  5806  5807  5808
##  [5809]  5809  5810  5811  5812  5813  5814  5815  5816  5817  5818  5819  5820
##  [5821]  5821  5822  5823  5824  5825  5826  5827  5828  5829  5830  5831  5832
##  [5833]  5833  5834  5835  5836  5837  5838  5839  5840  5841  5842  5843  5844
##  [5845]  5845  5846  5847  5848  5849  5850  5851  5852  5853  5854  5855  5856
##  [5857]  5857  5858  5859  5860  5861  5862  5863  5864  5865  5866  5867  5868
##  [5869]  5869  5870  5871  5872  5873  5874  5875  5876  5877  5878  5879  5880
##  [5881]  5881  5882  5883  5884  5885  5886  5887  5888  5889  5890  5891  5892
##  [5893]  5893  5894  5895  5896  5897  5898  5899  5900  5901  5902  5903  5904
##  [5905]  5905  5906  5907  5908  5909  5910  5911  5912  5913  5914  5915  5916
##  [5917]  5917  5918  5919  5920  5921  5922  5923  5924  5925  5926  5927  5928
##  [5929]  5929  5930  5931  5932  5933  5934  5935  5936  5937  5938  5939  5940
##  [5941]  5941  5942  5943  5944  5945  5946  5947  5948  5949  5950  5951  5952
##  [5953]  5953  5954  5955  5956  5957  5958  5959  5960  5961  5962  5963  5964
##  [5965]  5965  5966  5967  5968  5969  5970  5971  5972  5973  5974  5975  5976
##  [5977]  5977  5978  5979  5980  5981  5982  5983  5984  5985  5986  5987  5988
##  [5989]  5989  5990  5991  5992  5993  5994  5995  5996  5997  5998  5999  6000
##  [6001]  6001  6002  6003  6004  6005  6006  6007  6008  6009  6010  6011  6012
##  [6013]  6013  6014  6015  6016  6017  6018  6019  6020  6021  6022  6023  6024
##  [6025]  6025  6026  6027  6028  6029  6030  6031  6032  6033  6034  6035  6036
##  [6037]  6037  6038  6039  6040  6041  6042  6043  6044  6045  6046  6047  6048
##  [6049]  6049  6050  6051  6052  6053  6054  6055  6056  6057  6058  6059  6060
##  [6061]  6061  6062  6063  6064  6065  6066  6067  6068  6069  6070  6071  6072
##  [6073]  6073  6074  6075  6076  6077  6078  6079  6080  6081  6082  6083  6084
##  [6085]  6085  6086  6087  6088  6089  6090  6091  6092  6093  6094  6095  6096
##  [6097]  6097  6098  6099  6100  6101  6102  6103  6104  6105  6106  6107  6108
##  [6109]  6109  6110  6111  6112  6113  6114  6115  6116  6117  6118  6119  6120
##  [6121]  6121  6122  6123  6124  6125  6126  6127  6128  6129  6130  6131  6132
##  [6133]  6133  6134  6135  6136  6137  6138  6139  6140  6141  6142  6143  6144
##  [6145]  6145  6146  6147  6148  6149  6150  6151  6152  6153  6154  6155  6156
##  [6157]  6157  6158  6159  6160  6161  6162  6163  6164  6165  6166  6167  6168
##  [6169]  6169  6170  6171  6172  6173  6174  6175  6176  6177  6178  6179  6180
##  [6181]  6181  6182  6183  6184  6185  6186  6187  6188  6189  6190  6191  6192
##  [6193]  6193  6194  6195  6196  6197  6198  6199  6200  6201  6202  6203  6204
##  [6205]  6205  6206  6207  6208  6209  6210  6211  6212  6213  6214  6215  6216
##  [6217]  6217  6218  6219  6220  6221  6222  6223  6224  6225  6226  6227  6228
##  [6229]  6229  6230  6231  6232  6233  6234  6235  6236  6237  6238  6239  6240
##  [6241]  6241  6242  6243  6244  6245  6246  6247  6248  6249  6250  6251  6252
##  [6253]  6253  6254  6255  6256  6257  6258  6259  6260  6261  6262  6263  6264
##  [6265]  6265  6266  6267  6268  6269  6270  6271  6272  6273  6274  6275  6276
##  [6277]  6277  6278  6279  6280  6281  6282  6283  6284  6285  6286  6287  6288
##  [6289]  6289  6290  6291  6292  6293  6294  6295  6296  6297  6298  6299  6300
##  [6301]  6301  6302  6303  6304  6305  6306  6307  6308  6309  6310  6311  6312
##  [6313]  6313  6314  6315  6316  6317  6318  6319  6320  6321  6322  6323  6324
##  [6325]  6325  6326  6327  6328  6329  6330  6331  6332  6333  6334  6335  6336
##  [6337]  6337  6338  6339  6340  6341  6342  6343  6344  6345  6346  6347  6348
##  [6349]  6349  6350  6351  6352  6353  6354  6355  6356  6357  6358  6359  6360
##  [6361]  6361  6362  6363  6364  6365  6366  6367  6368  6369  6370  6371  6372
##  [6373]  6373  6374  6375  6376  6377  6378  6379  6380  6381  6382  6383  6384
##  [6385]  6385  6386  6387  6388  6389  6390  6391  6392  6393  6394  6395  6396
##  [6397]  6397  6398  6399  6400  6401  6402  6403  6404  6405  6406  6407  6408
##  [6409]  6409  6410  6411  6412  6413  6414  6415  6416  6417  6418  6419  6420
##  [6421]  6421  6422  6423  6424  6425  6426  6427  6428  6429  6430  6431  6432
##  [6433]  6433  6434  6435  6436  6437  6438  6439  6440  6441  6442  6443  6444
##  [6445]  6445  6446  6447  6448  6449  6450  6451  6452  6453  6454  6455  6456
##  [6457]  6457  6458  6459  6460  6461  6462  6463  6464  6465  6466  6467  6468
##  [6469]  6469  6470  6471  6472  6473  6474  6475  6476  6477  6478  6479  6480
##  [6481]  6481  6482  6483  6484  6485  6486  6487  6488  6489  6490  6491  6492
##  [6493]  6493  6494  6495  6496  6497  6498  6499  6500  6501  6502  6503  6504
##  [6505]  6505  6506  6507  6508  6509  6510  6511  6512  6513  6514  6515  6516
##  [6517]  6517  6518  6519  6520  6521  6522  6523  6524  6525  6526  6527  6528
##  [6529]  6529  6530  6531  6532  6533  6534  6535  6536  6537  6538  6539  6540
##  [6541]  6541  6542  6543  6544  6545  6546  6547  6548  6549  6550  6551  6552
##  [6553]  6553  6554  6555  6556  6557  6558  6559  6560  6561  6562  6563  6564
##  [6565]  6565  6566  6567  6568  6569  6570  6571  6572  6573  6574  6575  6576
##  [6577]  6577  6578  6579  6580  6581  6582  6583  6584  6585  6586  6587  6588
##  [6589]  6589  6590  6591  6592  6593  6594  6595  6596  6597  6598  6599  6600
##  [6601]  6601  6602  6603  6604  6605  6606  6607  6608  6609  6610  6611  6612
##  [6613]  6613  6614  6615  6616  6617  6618  6619  6620  6621  6622  6623  6624
##  [6625]  6625  6626  6627  6628  6629  6630  6631  6632  6633  6634  6635  6636
##  [6637]  6637  6638  6639  6640  6641  6642  6643  6644  6645  6646  6647  6648
##  [6649]  6649  6650  6651  6652  6653  6654  6655  6656  6657  6658  6659  6660
##  [6661]  6661  6662  6663  6664  6665  6666  6667  6668  6669  6670  6671  6672
##  [6673]  6673  6674  6675  6676  6677  6678  6679  6680  6681  6682  6683  6684
##  [6685]  6685  6686  6687  6688  6689  6690  6691  6692  6693  6694  6695  6696
##  [6697]  6697  6698  6699  6700  6701  6702  6703  6704  6705  6706  6707  6708
##  [6709]  6709  6710  6711  6712  6713  6714  6715  6716  6717  6718  6719  6720
##  [6721]  6721  6722  6723  6724  6725  6726  6727  6728  6729  6730  6731  6732
##  [6733]  6733  6734  6735  6736  6737  6738  6739  6740  6741  6742  6743  6744
##  [6745]  6745  6746  6747  6748  6749  6750  6751  6752  6753  6754  6755  6756
##  [6757]  6757  6758  6759  6760  6761  6762  6763  6764  6765  6766  6767  6768
##  [6769]  6769  6770  6771  6772  6773  6774  6775  6776  6777  6778  6779  6780
##  [6781]  6781  6782  6783  6784  6785  6786  6787  6788  6789  6790  6791  6792
##  [6793]  6793  6794  6795  6796  6797  6798  6799  6800  6801  6802  6803  6804
##  [6805]  6805  6806  6807  6808  6809  6810  6811  6812  6813  6814  6815  6816
##  [6817]  6817  6818  6819  6820  6821  6822  6823  6824  6825  6826  6827  6828
##  [6829]  6829  6830  6831  6832  6833  6834  6835  6836  6837  6838  6839  6840
##  [6841]  6841  6842  6843  6844  6845  6846  6847  6848  6849  6850  6851  6852
##  [6853]  6853  6854  6855  6856  6857  6858  6859  6860  6861  6862  6863  6864
##  [6865]  6865  6866  6867  6868  6869  6870  6871  6872  6873  6874  6875  6876
##  [6877]  6877  6878  6879  6880  6881  6882  6883  6884  6885  6886  6887  6888
##  [6889]  6889  6890  6891  6892  6893  6894  6895  6896  6897  6898  6899  6900
##  [6901]  6901  6902  6903  6904  6905  6906  6907  6908  6909  6910  6911  6912
##  [6913]  6913  6914  6915  6916  6917  6918  6919  6920  6921  6922  6923  6924
##  [6925]  6925  6926  6927  6928  6929  6930  6931  6932  6933  6934  6935  6936
##  [6937]  6937  6938  6939  6940  6941  6942  6943  6944  6945  6946  6947  6948
##  [6949]  6949  6950  6951  6952  6953  6954  6955  6956  6957  6958  6959  6960
##  [6961]  6961  6962  6963  6964  6965  6966  6967  6968  6969  6970  6971  6972
##  [6973]  6973  6974  6975  6976  6977  6978  6979  6980  6981  6982  6983  6984
##  [6985]  6985  6986  6987  6988  6989  6990  6991  6992  6993  6994  6995  6996
##  [6997]  6997  6998  6999  7000  7001  7002  7003  7004  7005  7006  7007  7008
##  [7009]  7009  7010  7011  7012  7013  7014  7015  7016  7017  7018  7019  7020
##  [7021]  7021  7022  7023  7024  7025  7026  7027  7028  7029  7030  7031  7032
##  [7033]  7033  7034  7035  7036  7037  7038  7039  7040  7041  7042  7043  7044
##  [7045]  7045  7046  7047  7048  7049  7050  7051  7052  7053  7054  7055  7056
##  [7057]  7057  7058  7059  7060  7061  7062  7063  7064  7065  7066  7067  7068
##  [7069]  7069  7070  7071  7072  7073  7074  7075  7076  7077  7078  7079  7080
##  [7081]  7081  7082  7083  7084  7085  7086  7087  7088  7089  7090  7091  7092
##  [7093]  7093  7094  7095  7096  7097  7098  7099  7100  7101  7102  7103  7104
##  [7105]  7105  7106  7107  7108  7109  7110  7111  7112  7113  7114  7115  7116
##  [7117]  7117  7118  7119  7120  7121  7122  7123  7124  7125  7126  7127  7128
##  [7129]  7129  7130  7131  7132  7133  7134  7135  7136  7137  7138  7139  7140
##  [7141]  7141  7142  7143  7144  7145  7146  7147  7148  7149  7150  7151  7152
##  [7153]  7153  7154  7155  7156  7157  7158  7159  7160  7161  7162  7163  7164
##  [7165]  7165  7166  7167  7168  7169  7170  7171  7172  7173  7174  7175  7176
##  [7177]  7177  7178  7179  7180  7181  7182  7183  7184  7185  7186  7187  7188
##  [7189]  7189  7190  7191  7192  7193  7194  7195  7196  7197  7198  7199  7200
##  [7201]  7201  7202  7203  7204  7205  7206  7207  7208  7209  7210  7211  7212
##  [7213]  7213  7214  7215  7216  7217  7218  7219  7220  7221  7222  7223  7224
##  [7225]  7225  7226  7227  7228  7229  7230  7231  7232  7233  7234  7235  7236
##  [7237]  7237  7238  7239  7240  7241  7242  7243  7244  7245  7246  7247  7248
##  [7249]  7249  7250  7251  7252  7253  7254  7255  7256  7257  7258  7259  7260
##  [7261]  7261  7262  7263  7264  7265  7266  7267  7268  7269  7270  7271  7272
##  [7273]  7273  7274  7275  7276  7277  7278  7279  7280  7281  7282  7283  7284
##  [7285]  7285  7286  7287  7288  7289  7290  7291  7292  7293  7294  7295  7296
##  [7297]  7297  7298  7299  7300  7301  7302  7303  7304  7305  7306  7307  7308
##  [7309]  7309  7310  7311  7312  7313  7314  7315  7316  7317  7318  7319  7320
##  [7321]  7321  7322  7323  7324  7325  7326  7327  7328  7329  7330  7331  7332
##  [7333]  7333  7334  7335  7336  7337  7338  7339  7340  7341  7342  7343  7344
##  [7345]  7345  7346  7347  7348  7349  7350  7351  7352  7353  7354  7355  7356
##  [7357]  7357  7358  7359  7360  7361  7362  7363  7364  7365  7366  7367  7368
##  [7369]  7369  7370  7371  7372  7373  7374  7375  7376  7377  7378  7379  7380
##  [7381]  7381  7382  7383  7384  7385  7386  7387  7388  7389  7390  7391  7392
##  [7393]  7393  7394  7395  7396  7397  7398  7399  7400  7401  7402  7403  7404
##  [7405]  7405  7406  7407  7408  7409  7410  7411  7412  7413  7414  7415  7416
##  [7417]  7417  7418  7419  7420  7421  7422  7423  7424  7425  7426  7427  7428
##  [7429]  7429  7430  7431  7432  7433  7434  7435  7436  7437  7438  7439  7440
##  [7441]  7441  7442  7443  7444  7445  7446  7447  7448  7449  7450  7451  7452
##  [7453]  7453  7454  7455  7456  7457  7458  7459  7460  7461  7462  7463  7464
##  [7465]  7465  7466  7467  7468  7469  7470  7471  7472  7473  7474  7475  7476
##  [7477]  7477  7478  7479  7480  7481  7482  7483  7484  7485  7486  7487  7488
##  [7489]  7489  7490  7491  7492  7493  7494  7495  7496  7497  7498  7499  7500
##  [7501]  7501  7502  7503  7504  7505  7506  7507  7508  7509  7510  7511  7512
##  [7513]  7513  7514  7515  7516  7517  7518  7519  7520  7521  7522  7523  7524
##  [7525]  7525  7526  7527  7528  7529  7530  7531  7532  7533  7534  7535  7536
##  [7537]  7537  7538  7539  7540  7541  7542  7543  7544  7545  7546  7547  7548
##  [7549]  7549  7550  7551  7552  7553  7554  7555  7556  7557  7558  7559  7560
##  [7561]  7561  7562  7563  7564  7565  7566  7567  7568  7569  7570  7571  7572
##  [7573]  7573  7574  7575  7576  7577  7578  7579  7580  7581  7582  7583  7584
##  [7585]  7585  7586  7587  7588  7589  7590  7591  7592  7593  7594  7595  7596
##  [7597]  7597  7598  7599  7600  7601  7602  7603  7604  7605  7606  7607  7608
##  [7609]  7609  7610  7611  7612  7613  7614  7615  7616  7617  7618  7619  7620
##  [7621]  7621  7622  7623  7624  7625  7626  7627  7628  7629  7630  7631  7632
##  [7633]  7633  7634  7635  7636  7637  7638  7639  7640  7641  7642  7643  7644
##  [7645]  7645  7646  7647  7648  7649  7650  7651  7652  7653  7654  7655  7656
##  [7657]  7657  7658  7659  7660  7661  7662  7663  7664  7665  7666  7667  7668
##  [7669]  7669  7670  7671  7672  7673  7674  7675  7676  7677  7678  7679  7680
##  [7681]  7681  7682  7683  7684  7685  7686  7687  7688  7689  7690  7691  7692
##  [7693]  7693  7694  7695  7696  7697  7698  7699  7700  7701  7702  7703  7704
##  [7705]  7705  7706  7707  7708  7709  7710  7711  7712  7713  7714  7715  7716
##  [7717]  7717  7718  7719  7720  7721  7722  7723  7724  7725  7726  7727  7728
##  [7729]  7729  7730  7731  7732  7733  7734  7735  7736  7737  7738  7739  7740
##  [7741]  7741  7742  7743  7744  7745  7746  7747  7748  7749  7750  7751  7752
##  [7753]  7753  7754  7755  7756  7757  7758  7759  7760  7761  7762  7763  7764
##  [7765]  7765  7766  7767  7768  7769  7770  7771  7772  7773  7774  7775  7776
##  [7777]  7777  7778  7779  7780  7781  7782  7783  7784  7785  7786  7787  7788
##  [7789]  7789  7790  7791  7792  7793  7794  7795  7796  7797  7798  7799  7800
##  [7801]  7801  7802  7803  7804  7805  7806  7807  7808  7809  7810  7811  7812
##  [7813]  7813  7814  7815  7816  7817  7818  7819  7820  7821  7822  7823  7824
##  [7825]  7825  7826  7827  7828  7829  7830  7831  7832  7833  7834  7835  7836
##  [7837]  7837  7838  7839  7840  7841  7842  7843  7844  7845  7846  7847  7848
##  [7849]  7849  7850  7851  7852  7853  7854  7855  7856  7857  7858  7859  7860
##  [7861]  7861  7862  7863  7864  7865  7866  7867  7868  7869  7870  7871  7872
##  [7873]  7873  7874  7875  7876  7877  7878  7879  7880  7881  7882  7883  7884
##  [7885]  7885  7886  7887  7888  7889  7890  7891  7892  7893  7894  7895  7896
##  [7897]  7897  7898  7899  7900  7901  7902  7903  7904  7905  7906  7907  7908
##  [7909]  7909  7910  7911  7912  7913  7914  7915  7916  7917  7918  7919  7920
##  [7921]  7921  7922  7923  7924  7925  7926  7927  7928  7929  7930  7931  7932
##  [7933]  7933  7934  7935  7936  7937  7938  7939  7940  7941  7942  7943  7944
##  [7945]  7945  7946  7947  7948  7949  7950  7951  7952  7953  7954  7955  7956
##  [7957]  7957  7958  7959  7960  7961  7962  7963  7964  7965  7966  7967  7968
##  [7969]  7969  7970  7971  7972  7973  7974  7975  7976  7977  7978  7979  7980
##  [7981]  7981  7982  7983  7984  7985  7986  7987  7988  7989  7990  7991  7992
##  [7993]  7993  7994  7995  7996  7997  7998  7999  8000  8001  8002  8003  8004
##  [8005]  8005  8006  8007  8008  8009  8010  8011  8012  8013  8014  8015  8016
##  [8017]  8017  8018  8019  8020  8021  8022  8023  8024  8025  8026  8027  8028
##  [8029]  8029  8030  8031  8032  8033  8034  8035  8036  8037  8038  8039  8040
##  [8041]  8041  8042  8043  8044  8045  8046  8047  8048  8049  8050  8051  8052
##  [8053]  8053  8054  8055  8056  8057  8058  8059  8060  8061  8062  8063  8064
##  [8065]  8065  8066  8067  8068  8069  8070  8071  8072  8073  8074  8075  8076
##  [8077]  8077  8078  8079  8080  8081  8082  8083  8084  8085  8086  8087  8088
##  [8089]  8089  8090  8091  8092  8093  8094  8095  8096  8097  8098  8099  8100
##  [8101]  8101  8102  8103  8104  8105  8106  8107  8108  8109  8110  8111  8112
##  [8113]  8113  8114  8115  8116  8117  8118  8119  8120  8121  8122  8123  8124
##  [8125]  8125  8126  8127  8128  8129  8130  8131  8132  8133  8134  8135  8136
##  [8137]  8137  8138  8139  8140  8141  8142  8143  8144  8145  8146  8147  8148
##  [8149]  8149  8150  8151  8152  8153  8154  8155  8156  8157  8158  8159  8160
##  [8161]  8161  8162  8163  8164  8165  8166  8167  8168  8169  8170  8171  8172
##  [8173]  8173  8174  8175  8176  8177  8178  8179  8180  8181  8182  8183  8184
##  [8185]  8185  8186  8187  8188  8189  8190  8191  8192  8193  8194  8195  8196
##  [8197]  8197  8198  8199  8200  8201  8202  8203  8204  8205  8206  8207  8208
##  [8209]  8209  8210  8211  8212  8213  8214  8215  8216  8217  8218  8219  8220
##  [8221]  8221  8222  8223  8224  8225  8226  8227  8228  8229  8230  8231  8232
##  [8233]  8233  8234  8235  8236  8237  8238  8239  8240  8241  8242  8243  8244
##  [8245]  8245  8246  8247  8248  8249  8250  8251  8252  8253  8254  8255  8256
##  [8257]  8257  8258  8259  8260  8261  8262  8263  8264  8265  8266  8267  8268
##  [8269]  8269  8270  8271  8272  8273  8274  8275  8276  8277  8278  8279  8280
##  [8281]  8281  8282  8283  8284  8285  8286  8287  8288  8289  8290  8291  8292
##  [8293]  8293  8294  8295  8296  8297  8298  8299  8300  8301  8302  8303  8304
##  [8305]  8305  8306  8307  8308  8309  8310  8311  8312  8313  8314  8315  8316
##  [8317]  8317  8318  8319  8320  8321  8322  8323  8324  8325  8326  8327  8328
##  [8329]  8329  8330  8331  8332  8333  8334  8335  8336  8337  8338  8339  8340
##  [8341]  8341  8342  8343  8344  8345  8346  8347  8348  8349  8350  8351  8352
##  [8353]  8353  8354  8355  8356  8357  8358  8359  8360  8361  8362  8363  8364
##  [8365]  8365  8366  8367  8368  8369  8370  8371  8372  8373  8374  8375  8376
##  [8377]  8377  8378  8379  8380  8381  8382  8383  8384  8385  8386  8387  8388
##  [8389]  8389  8390  8391  8392  8393  8394  8395  8396  8397  8398  8399  8400
##  [8401]  8401  8402  8403  8404  8405  8406  8407  8408  8409  8410  8411  8412
##  [8413]  8413  8414  8415  8416  8417  8418  8419  8420  8421  8422  8423  8424
##  [8425]  8425  8426  8427  8428  8429  8430  8431  8432  8433  8434  8435  8436
##  [8437]  8437  8438  8439  8440  8441  8442  8443  8444  8445  8446  8447  8448
##  [8449]  8449  8450  8451  8452  8453  8454  8455  8456  8457  8458  8459  8460
##  [8461]  8461  8462  8463  8464  8465  8466  8467  8468  8469  8470  8471  8472
##  [8473]  8473  8474  8475  8476  8477  8478  8479  8480  8481  8482  8483  8484
##  [8485]  8485  8486  8487  8488  8489  8490  8491  8492  8493  8494  8495  8496
##  [8497]  8497  8498  8499  8500  8501  8502  8503  8504  8505  8506  8507  8508
##  [8509]  8509  8510  8511  8512  8513  8514  8515  8516  8517  8518  8519  8520
##  [8521]  8521  8522  8523  8524  8525  8526  8527  8528  8529  8530  8531  8532
##  [8533]  8533  8534  8535  8536  8537  8538  8539  8540  8541  8542  8543  8544
##  [8545]  8545  8546  8547  8548  8549  8550  8551  8552  8553  8554  8555  8556
##  [8557]  8557  8558  8559  8560  8561  8562  8563  8564  8565  8566  8567  8568
##  [8569]  8569  8570  8571  8572  8573  8574  8575  8576  8577  8578  8579  8580
##  [8581]  8581  8582  8583  8584  8585  8586  8587  8588  8589  8590  8591  8592
##  [8593]  8593  8594  8595  8596  8597  8598  8599  8600  8601  8602  8603  8604
##  [8605]  8605  8606  8607  8608  8609  8610  8611  8612  8613  8614  8615  8616
##  [8617]  8617  8618  8619  8620  8621  8622  8623  8624  8625  8626  8627  8628
##  [8629]  8629  8630  8631  8632  8633  8634  8635  8636  8637  8638  8639  8640
##  [8641]  8641  8642  8643  8644  8645  8646  8647  8648  8649  8650  8651  8652
##  [8653]  8653  8654  8655  8656  8657  8658  8659  8660  8661  8662  8663  8664
##  [8665]  8665  8666  8667  8668  8669  8670  8671  8672  8673  8674  8675  8676
##  [8677]  8677  8678  8679  8680  8681  8682  8683  8684  8685  8686  8687  8688
##  [8689]  8689  8690  8691  8692  8693  8694  8695  8696  8697  8698  8699  8700
##  [8701]  8701  8702  8703  8704  8705  8706  8707  8708  8709  8710  8711  8712
##  [8713]  8713  8714  8715  8716  8717  8718  8719  8720  8721  8722  8723  8724
##  [8725]  8725  8726  8727  8728  8729  8730  8731  8732  8733  8734  8735  8736
##  [8737]  8737  8738  8739  8740  8741  8742  8743  8744  8745  8746  8747  8748
##  [8749]  8749  8750  8751  8752  8753  8754  8755  8756  8757  8758  8759  8760
##  [8761]  8761  8762  8763  8764  8765  8766  8767  8768  8769  8770  8771  8772
##  [8773]  8773  8774  8775  8776  8777  8778  8779  8780  8781  8782  8783  8784
##  [8785]  8785  8786  8787  8788  8789  8790  8791  8792  8793  8794  8795  8796
##  [8797]  8797  8798  8799  8800  8801  8802  8803  8804  8805  8806  8807  8808
##  [8809]  8809  8810  8811  8812  8813  8814  8815  8816  8817  8818  8819  8820
##  [8821]  8821  8822  8823  8824  8825  8826  8827  8828  8829  8830  8831  8832
##  [8833]  8833  8834  8835  8836  8837  8838  8839  8840  8841  8842  8843  8844
##  [8845]  8845  8846  8847  8848  8849  8850  8851  8852  8853  8854  8855  8856
##  [8857]  8857  8858  8859  8860  8861  8862  8863  8864  8865  8866  8867  8868
##  [8869]  8869  8870  8871  8872  8873  8874  8875  8876  8877  8878  8879  8880
##  [8881]  8881  8882  8883  8884  8885  8886  8887  8888  8889  8890  8891  8892
##  [8893]  8893  8894  8895  8896  8897  8898  8899  8900  8901  8902  8903  8904
##  [8905]  8905  8906  8907  8908  8909  8910  8911  8912  8913  8914  8915  8916
##  [8917]  8917  8918  8919  8920  8921  8922  8923  8924  8925  8926  8927  8928
##  [8929]  8929  8930  8931  8932  8933  8934  8935  8936  8937  8938  8939  8940
##  [8941]  8941  8942  8943  8944  8945  8946  8947  8948  8949  8950  8951  8952
##  [8953]  8953  8954  8955  8956  8957  8958  8959  8960  8961  8962  8963  8964
##  [8965]  8965  8966  8967  8968  8969  8970  8971  8972  8973  8974  8975  8976
##  [8977]  8977  8978  8979  8980  8981  8982  8983  8984  8985  8986  8987  8988
##  [8989]  8989  8990  8991  8992  8993  8994  8995  8996  8997  8998  8999  9000
##  [9001]  9001  9002  9003  9004  9005  9006  9007  9008  9009  9010  9011  9012
##  [9013]  9013  9014  9015  9016  9017  9018  9019  9020  9021  9022  9023  9024
##  [9025]  9025  9026  9027  9028  9029  9030  9031  9032  9033  9034  9035  9036
##  [9037]  9037  9038  9039  9040  9041  9042  9043  9044  9045  9046  9047  9048
##  [9049]  9049  9050  9051  9052  9053  9054  9055  9056  9057  9058  9059  9060
##  [9061]  9061  9062  9063  9064  9065  9066  9067  9068  9069  9070  9071  9072
##  [9073]  9073  9074  9075  9076  9077  9078  9079  9080  9081  9082  9083  9084
##  [9085]  9085  9086  9087  9088  9089  9090  9091  9092  9093  9094  9095  9096
##  [9097]  9097  9098  9099  9100  9101  9102  9103  9104  9105  9106  9107  9108
##  [9109]  9109  9110  9111  9112  9113  9114  9115  9116  9117  9118  9119  9120
##  [9121]  9121  9122  9123  9124  9125  9126  9127  9128  9129  9130  9131  9132
##  [9133]  9133  9134  9135  9136  9137  9138  9139  9140  9141  9142  9143  9144
##  [9145]  9145  9146  9147  9148  9149  9150  9151  9152  9153  9154  9155  9156
##  [9157]  9157  9158  9159  9160  9161  9162  9163  9164  9165  9166  9167  9168
##  [9169]  9169  9170  9171  9172  9173  9174  9175  9176  9177  9178  9179  9180
##  [9181]  9181  9182  9183  9184  9185  9186  9187  9188  9189  9190  9191  9192
##  [9193]  9193  9194  9195  9196  9197  9198  9199  9200  9201  9202  9203  9204
##  [9205]  9205  9206  9207  9208  9209  9210  9211  9212  9213  9214  9215  9216
##  [9217]  9217  9218  9219  9220  9221  9222  9223  9224  9225  9226  9227  9228
##  [9229]  9229  9230  9231  9232  9233  9234  9235  9236  9237  9238  9239  9240
##  [9241]  9241  9242  9243  9244  9245  9246  9247  9248  9249  9250  9251  9252
##  [9253]  9253  9254  9255  9256  9257  9258  9259  9260  9261  9262  9263  9264
##  [9265]  9265  9266  9267  9268  9269  9270  9271  9272  9273  9274  9275  9276
##  [9277]  9277  9278  9279  9280  9281  9282  9283  9284  9285  9286  9287  9288
##  [9289]  9289  9290  9291  9292  9293  9294  9295  9296  9297  9298  9299  9300
##  [9301]  9301  9302  9303  9304  9305  9306  9307  9308  9309  9310  9311  9312
##  [9313]  9313  9314  9315  9316  9317  9318  9319  9320  9321  9322  9323  9324
##  [9325]  9325  9326  9327  9328  9329  9330  9331  9332  9333  9334  9335  9336
##  [9337]  9337  9338  9339  9340  9341  9342  9343  9344  9345  9346  9347  9348
##  [9349]  9349  9350  9351  9352  9353  9354  9355  9356  9357  9358  9359  9360
##  [9361]  9361  9362  9363  9364  9365  9366  9367  9368  9369  9370  9371  9372
##  [9373]  9373  9374  9375  9376  9377  9378  9379  9380  9381  9382  9383  9384
##  [9385]  9385  9386  9387  9388  9389  9390  9391  9392  9393  9394  9395  9396
##  [9397]  9397  9398  9399  9400  9401  9402  9403  9404  9405  9406  9407  9408
##  [9409]  9409  9410  9411  9412  9413  9414  9415  9416  9417  9418  9419  9420
##  [9421]  9421  9422  9423  9424  9425  9426  9427  9428  9429  9430  9431  9432
##  [9433]  9433  9434  9435  9436  9437  9438  9439  9440  9441  9442  9443  9444
##  [9445]  9445  9446  9447  9448  9449  9450  9451  9452  9453  9454  9455  9456
##  [9457]  9457  9458  9459  9460  9461  9462  9463  9464  9465  9466  9467  9468
##  [9469]  9469  9470  9471  9472  9473  9474  9475  9476  9477  9478  9479  9480
##  [9481]  9481  9482  9483  9484  9485  9486  9487  9488  9489  9490  9491  9492
##  [9493]  9493  9494  9495  9496  9497  9498  9499  9500  9501  9502  9503  9504
##  [9505]  9505  9506  9507  9508  9509  9510  9511  9512  9513  9514  9515  9516
##  [9517]  9517  9518  9519  9520  9521  9522  9523  9524  9525  9526  9527  9528
##  [9529]  9529  9530  9531  9532  9533  9534  9535  9536  9537  9538  9539  9540
##  [9541]  9541  9542  9543  9544  9545  9546  9547  9548  9549  9550  9551  9552
##  [9553]  9553  9554  9555  9556  9557  9558  9559  9560  9561  9562  9563  9564
##  [9565]  9565  9566  9567  9568  9569  9570  9571  9572  9573  9574  9575  9576
##  [9577]  9577  9578  9579  9580  9581  9582  9583  9584  9585  9586  9587  9588
##  [9589]  9589  9590  9591  9592  9593  9594  9595  9596  9597  9598  9599  9600
##  [9601]  9601  9602  9603  9604  9605  9606  9607  9608  9609  9610  9611  9612
##  [9613]  9613  9614  9615  9616  9617  9618  9619  9620  9621  9622  9623  9624
##  [9625]  9625  9626  9627  9628  9629  9630  9631  9632  9633  9634  9635  9636
##  [9637]  9637  9638  9639  9640  9641  9642  9643  9644  9645  9646  9647  9648
##  [9649]  9649  9650  9651  9652  9653  9654  9655  9656  9657  9658  9659  9660
##  [9661]  9661  9662  9663  9664  9665  9666  9667  9668  9669  9670  9671  9672
##  [9673]  9673  9674  9675  9676  9677  9678  9679  9680  9681  9682  9683  9684
##  [9685]  9685  9686  9687  9688  9689  9690  9691  9692  9693  9694  9695  9696
##  [9697]  9697  9698  9699  9700  9701  9702  9703  9704  9705  9706  9707  9708
##  [9709]  9709  9710  9711  9712  9713  9714  9715  9716  9717  9718  9719  9720
##  [9721]  9721  9722  9723  9724  9725  9726  9727  9728  9729  9730  9731  9732
##  [9733]  9733  9734  9735  9736  9737  9738  9739  9740  9741  9742  9743  9744
##  [9745]  9745  9746  9747  9748  9749  9750  9751  9752  9753  9754  9755  9756
##  [9757]  9757  9758  9759  9760  9761  9762  9763  9764  9765  9766  9767  9768
##  [9769]  9769  9770  9771  9772  9773  9774  9775  9776  9777  9778  9779  9780
##  [9781]  9781  9782  9783  9784  9785  9786  9787  9788  9789  9790  9791  9792
##  [9793]  9793  9794  9795  9796  9797  9798  9799  9800  9801  9802  9803  9804
##  [9805]  9805  9806  9807  9808  9809  9810  9811  9812  9813  9814  9815  9816
##  [9817]  9817  9818  9819  9820  9821  9822  9823  9824  9825  9826  9827  9828
##  [9829]  9829  9830  9831  9832  9833  9834  9835  9836  9837  9838  9839  9840
##  [9841]  9841  9842  9843  9844  9845  9846  9847  9848  9849  9850  9851  9852
##  [9853]  9853  9854  9855  9856  9857  9858  9859  9860  9861  9862  9863  9864
##  [9865]  9865  9866  9867  9868  9869  9870  9871  9872  9873  9874  9875  9876
##  [9877]  9877  9878  9879  9880  9881  9882  9883  9884  9885  9886  9887  9888
##  [9889]  9889  9890  9891  9892  9893  9894  9895  9896  9897  9898  9899  9900
##  [9901]  9901  9902  9903  9904  9905  9906  9907  9908  9909  9910  9911  9912
##  [9913]  9913  9914  9915  9916  9917  9918  9919  9920  9921  9922  9923  9924
##  [9925]  9925  9926  9927  9928  9929  9930  9931  9932  9933  9934  9935  9936
##  [9937]  9937  9938  9939  9940  9941  9942  9943  9944  9945  9946  9947  9948
##  [9949]  9949  9950  9951  9952  9953  9954  9955  9956  9957  9958  9959  9960
##  [9961]  9961  9962  9963  9964  9965  9966  9967  9968  9969  9970  9971  9972
##  [9973]  9973  9974  9975  9976  9977  9978  9979  9980  9981  9982  9983  9984
##  [9985]  9985  9986  9987  9988  9989  9990  9991  9992  9993  9994  9995  9996
##  [9997]  9997  9998  9999 10000

When you make a vector like any of the ones above, all you are really telling R to do is display the vector you’ve defined in the console. Most of the time, when you make a vector, you want to do something with it. This will usually mean using that vector as an argument for a function. To do that without having to redefine the vector every time means assigning that vector a name. This will save the vector, allowing you to use it for multiple operations without having to redefine it every time. In R, assignment takes place via the <- or -> operators. These two lines of code do the same thing. Notice when you assign a name to an object, it shows up in the upper right quadrant of RStudio.

c(1,2,3,4,5) -> numbers

numbers <- c(1,2,3,4,5)

Note that R is case sensitive. Entering “numbers” into the console will output our vector. Entering “Numbers” will not.

numbers
## [1] 1 2 3 4 5
Numbers

Having assigned this vector the name numbers, we can now easily use it in different functions.

sqrt(numbers)
## [1] 1.000000 1.414214 1.732051 2.000000 2.236068
max(numbers)
## [1] 5

The output of one function can be assigned a name, and stored as a new object.

sqrt(numbers) -> numbers.root

Exercise 1

Here’s a code reading exercise. Try to explain what each line of this code will do. You might have to look back up this document to review!

5 -> a
b <- c(1:10, 3,3, 900)
sample(b, a)

Exercise 2

Now try going from English instructions to code. Write code that will create a vector of at least five strings, and then sample two strings from that vector.

Excercise 3

What happens when you try to concatenate two vectors?

Data Frames

No one goes through the trouble of learning R to do operations to manipulate one-dimensional vectors. You learn R to analyze datasets. In R, most datasets are represented as a kind of object called a data frame. Data frames are two dimensional arrays of information divided into rows and columns.

The data frame mtcars is included with R, an illustrates how data frames are organized.

mtcars
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2

This data frame has 32 rows and 11 columns. The columns include information such as horsepower, engine cylinders, and weight. Note that the car names aren’t actually a column. In R, data frames can store a string as the name of each row. If the rows aren’t named, R will simply number them sequentially instead.

In looking at a dataframe, you will often want to examine only one part at a time. For example, you might only want to look at one column. To examine a single column of a dataset, use the $ sign.

mtcars$mpg
##  [1] 21.0 21.0 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 17.8 16.4 17.3 15.2 10.4
## [16] 10.4 14.7 32.4 30.4 33.9 21.5 15.5 15.2 13.3 19.2 27.3 26.0 30.4 15.8 19.7
## [31] 15.0 21.4

you might just want to look at the Dodge Challenger row in mtcars. This kind of subsetting is achieved via brackets.

mtcars["Dodge Challenger",]
##                   mpg cyl disp  hp drat   wt  qsec vs am gear carb
## Dodge Challenger 15.5   8  318 150 2.76 3.52 16.87  0  0    3    2

When using brackets to subset a dataframe, two values are required, separated by a comma. The first indicates the rows to be selected, and the second indicates the columns. Leaving either blank tells R to include all of them. Let’s say you wanted to see only the Toyotas, and only miles per gallon and weight.

mtcars[c("Toyota Corolla", "Toyota Corona"), c("mpg", "wt")]
##                 mpg    wt
## Toyota Corolla 33.9 1.835
## Toyota Corona  21.5 2.465

Notice that you have to use the concatenate function to indicate that you want to select more than one row or column. That’s because the brackets accept two and only two arguments: rows and columns. Trying to select rows like this will yield an error, because you are trying to give three arguments (one of which is blank) to a function that only accepts two.

mtcars["Toyota Corolla", "Toyota Corona",]

You can also use index numbers to subset. This means using the row and column position to select your subset. These two different ways of subsetting will produce identical output.

mtcars[c("Toyota Corolla", "Toyota Corona"), c("mpg", "wt")]
##                 mpg    wt
## Toyota Corolla 33.9 1.835
## Toyota Corona  21.5 2.465
mtcars[c(20, 21), c(1,6)]
##                 mpg    wt
## Toyota Corolla 33.9 1.835
## Toyota Corona  21.5 2.465

Subsetting with index numbers is often faster. But it can often come back to bite you. If you are altering your dataset, such as adding columns or dropping observations, your row and column index numbers can change, and code that worked at one point will stop working. Knowing how to use index numbers is an important part of R literacy, but there are often better ways to do things.

Exercise 1

How do you think you subset a vector?

Data frames can be stored in exactly the same way that vectors or elements can. If we wanted to create a new data frame to just look at the Toyotas, we would just assign our subsetting to a name.

mtcars[c(20, 21), c(1,6)] -> df.toyotas #You can call it whatever you want. I use this format for keeping track of objects I've created.

Getting Data

Just as you’re not learning R to play with vectors, you’re also not learning it to analyze mtcars. You want to analyze interesting data. To do that, you need to find some data. There are a million sources, but most data you find will be in one of three formats: a .csv, a STATA file, or an R file.

A .csv is a “comma separated variable” dataset. Essentially, it’s a text file that uses commas to indicate when a new variable begins. It’s a very simple, very highly usable data format. There are two main ways to get a .csv. First, you can use R to read it directly from a URL. For example, the Correlates of State Policy dataset maintained at Michigan State University, which we will be playing with a lot, is stored here: https://ippsr.msu.edu/sites/default/files/correlatesofstatepolicyprojectv2_2.csv.

To read this dataset directly from the internet, use

read.csv("https://ippsr.msu.edu/sites/default/files/correlatesofstatepolicyprojectv2_2.csv")

As you can see, this is a very large dataset, with over 2,000 variables. Note that when you enter the code to read the dataset, R simply tries to display the entire dataset in the console. Since the data are contained in 6,120 X 2,091 = 12,796,920 cells, it just tries to list all the variable names instead.

The other way to access these data is to download the file, and load it from your hard drive. Download the .csv, save it on your hard drive (maybe make a new “Data” folder), and load it like this:

read.csv("~/Downloads/correlatesofstatepolicyprojectv2_2.csv") #Note: your path to file will be different depending on your operating system and where you save the file.

Sometimes it’s better to save a dataset on your computer, and sometimes it’s better to read it off the internet. If it’s a dataset that’s regularly updated, it’s often easier to just load the data off the internet instead of always re-downloading it. Similarly, if your code just uses a URL to grab the data, you can run your code on another computer on which you haven’t downloaded the data, and it will still work. On other other hand, there are a lot of datasets that can’t be accessed like this, and just need to be downloaded.

In either of these cases, this code will simply output the data frame in your console. In order to save it as an object you can then analyze and manipulate, you will want to assign it a name.

df.cspp <- read.csv("https://ippsr.msu.edu/sites/default/files/correlatesofstatepolicyprojectv2_2.csv")

Now you can subset this in exactly the same way as mtcars. For big datasets like this, it’s often advantageous to subset it to only the variables you need. For example, let’s say you just wanted year, state, and population density from this dataset.

df.cspp[,c("year", "state", "popdensity")]
##    year                state popdensity
## 1  1900              Alabama         NA
## 2  1900               Alaska         NA
## 3  1900              Arizona         NA
## 4  1900             Arkansas         NA
## 5  1900           California         NA
## 6  1900             Colorado         NA
## 7  1900          Connecticut         NA
## 8  1900             Delaware         NA
## 9  1900 District of Columbia         NA
## 10 1900              Florida         NA
## 11 1900              Georgia         NA
## 12 1900               Hawaii         NA
## 13 1900                Idaho         NA
## 14 1900             Illinois         NA
## 15 1900              Indiana         NA
## 16 1900                 Iowa         NA
## 17 1900               Kansas         NA
## 18 1900             Kentucky         NA
## 19 1900            Louisiana         NA
## 20 1900                Maine         NA
## 21 1900             Maryland         NA
## 22 1900        Massachusetts         NA
## 23 1900             Michigan         NA
## 24 1900            Minnesota         NA
## 25 1900          Mississippi         NA
## 26 1900             Missouri         NA
## 27 1900              Montana         NA
## 28 1900             Nebraska         NA
## 29 1900               Nevada         NA
## 30 1900        New Hampshire         NA
## 31 1900           New Jersey         NA
## 32 1900           New Mexico         NA
## 33 1900             New York         NA
## 34 1900       North Carolina         NA
## 35 1900         North Dakota         NA
## 36 1900                 Ohio         NA
## 37 1900             Oklahoma         NA
## 38 1900               Oregon         NA
## 39 1900         Pennsylvania         NA
## 40 1900         Rhode Island         NA
## 41 1900       South Carolina         NA
## 42 1900         South Dakota         NA
## 43 1900            Tennessee         NA
## 44 1900                Texas         NA
## 45 1900                 Utah         NA
## 46 1900              Vermont         NA
## 47 1900             Virginia         NA
## 48 1900           Washington         NA
## 49 1900        West Virginia         NA
## 50 1900            Wisconsin         NA

We can then store it as a new object.

df.cspp[,c("year", "state", "popdensity")] -> df.cspp.trimmed

Loading STATA data (which are stored as .dta files) is slightly more complicated, and brings us to the last of the fundamental R concepts we need to cover: libraries.

As a software environment, R comes with lots and lots of functions. But R is a Turing-complete programming language, which means anything that can be done on a computer can be built with R. If you wanted to, you could build Fortnite from scratch using nothing but R code. That means that what R can theoretically do is unlimited. Moreover, R is free and open-source, meaning anyone can write functions for R. These functions are grouped together in libraries. For example, the package “tidycensus” provides functions that make it very easy to download Census tables directly into R.

To install tidycensus, use the following code:

install.packages("tidycensus")

Once a package is installed, it’s available to use in R. However, R will not load every library you’ve installed in every R session. That would be a waste of memory, and would slow the program down considerably. Instead, R waits for you to tell it what libraries you need to use. You do that with the library() function.

library(tidycensus)

Most people put all of the library commands they need to use at the top of their R script. We’ll see what that looks like a little later.

To return to reading STATA files, you need a library to load them. There are a lot of different options, but I prefer the library “haven”. Install it and load it in your R session now.

“Haven” uses the function read_dta() in order to read STATA data files. This code will read the Comparative Welfare States dataset from the Luxemburg Income Study website..

read_dta("http://www.lisdatacenter.org/wp-content/uploads/CWS-stata-2020.dta")
## # A tibble: 50 x 400
##    id      idn  year lisrpr_tot lisrpr_child lisrpr_eld lisrpr_tpf lisrpr_smf
##    <chr> <dbl> <dbl>      <dbl>        <dbl>      <dbl>      <dbl>      <dbl>
##  1 AUL       1  1960         NA           NA         NA         NA         NA
##  2 AUL       1  1961         NA           NA         NA         NA         NA
##  3 AUL       1  1962         NA           NA         NA         NA         NA
##  4 AUL       1  1963         NA           NA         NA         NA         NA
##  5 AUL       1  1964         NA           NA         NA         NA         NA
##  6 AUL       1  1965         NA           NA         NA         NA         NA
##  7 AUL       1  1966         NA           NA         NA         NA         NA
##  8 AUL       1  1967         NA           NA         NA         NA         NA
##  9 AUL       1  1968         NA           NA         NA         NA         NA
## 10 AUL       1  1969         NA           NA         NA         NA         NA
## # … with 40 more rows, and 392 more variables: pct_csmf <dbl>, pre_tot <dbl>,
## #   pre_singm <dbl>, pre_eld <dbl>, pre_ue <dbl>, post_ue <dbl>,
## #   pregini_2559 <dbl>, postgini_2559 <dbl>, pregini_1864 <dbl>,
## #   postgini_1864 <dbl>, lisgini <dbl>, lisd9010 <dbl>, lisd9050 <dbl>,
## #   lisd8020 <dbl>, mktmeasure <dbl>, postginioecd <dbl>, preginioecd <dbl>,
## #   ginimkt_1865 <dbl>, ginidisp_1865 <dbl>, p9010de <dbl>, p8020de <dbl>,
## #   mgini <dbl>, ngini <dbl>, rred <dbl>, abred <dbl>, top1 <dbl>, top10 <dbl>,
## #   mid <dbl>, bottom <dbl>, top1share <dbl>, top1sharec <dbl>,
## #   top1tenshare <dbl>, top1tensharec <dbl>, miwsenc <dbl>, compens <dbl>,
## #   earnprod <dbl>, wages <dbl>, lowpay <dbl>, p90p50 <dbl>, p50p10 <dbl>,
## #   p90p50v2 <dbl>, p50p10v2 <dbl>, p90p10 <dbl>, sstaxes <dbl>, pytaxes <dbl>,
## #   topmtax3 <dbl>, gen <dbl>, uegen <dbl>, sickgen <dbl>, pengen <dbl>,
## #   usucavg <dbl>, ssscavg <dbl>, unempsi <dbl>, unempco <dbl>, sstran <dbl>,
## #   hlpub <dbl>, totheal <dbl>, ptotheal <dbl>, phealgdp <dbl>, pcrexnc <dbl>,
## #   tcrexnc <dbl>, pinpat <dbl>, tinpat <dbl>, poupat <dbl>, toupat <dbl>,
## #   tmedcv <dbl>, pmedcv <dbl>, inpatcv <dbl>, outpatcv <dbl>, daycare <dbl>,
## #   oldage_pub <dbl>, survivor_pub <dbl>, incap_pub <dbl>, health_pub <dbl>,
## #   family_pub <dbl>, almp_pub <dbl>, unemp_pub <dbl>, housing_pub <dbl>,
## #   other_pub <dbl>, socx_pub <dbl>, oldage_pmp <dbl>, survivor_pmp <dbl>,
## #   incap_pmp <dbl>, health_pmp <dbl>, family_pmp <dbl>, almp_pmp <dbl>,
## #   unemp_pmp <dbl>, housing_pmp <dbl>, other_pmp <dbl>, socx_pmp <dbl>,
## #   almppes <dbl>, almptrain <dbl>, almpjob <dbl>, almpincen <dbl>,
## #   almprehab <dbl>, almpdir <dbl>, almpstart <dbl>, unemsup <dbl>,
## #   earlyretire <dbl>, edspendtw <dbl>, …

Now save this dataset by assigning it a name.

Finally, sometimes data are stored in files created by R. These files use the .rda extension. Download the file “presidential_precincts_2016.rda” from the Harvard Dataverse. To load the data into R, use this code (once again, you’ll need to change the path to where you saved the file).

load("~/Downloads/presidential_precincts_2016.rda")

When you load an .rda file, you don’t need to assign it a name. When these files are created, object names are assigned to them and those names automatically load.

Finally, when you have a data frame saved (so that it shows up in the “data” window in the upper right of RStudio), you can browse the data by clicking on it. You can do the same thing with this code.

View(df1)

Now, use this code to clear your saved objects. Let’s get on with the actually interesting use of R!

rm(list=ls())

We’ve Only Just Begun (To Do Data Carpentry)

Most of what you do in R will not be analysis in the sense of running regressions or other statistical tests, or even making graphs. Most of the work is what makes those things possible. This work is called data carpentry or data cleaning. Data carpentry involves everything from subsetting data to recoding variables to transforming data to merging datasets. It is the fundamental work that makes data analysis possible. It is also what will probably create the most headaches for you.

To learn data carpentry, we’re going to focus on two datasets. First, the Correlates of State Policy Project dataset that we looked at above, and the General Social Survey. You already know how to load the CSPP dataset. To load the GSS, load the “foreign” library and use its read.dta function to store the dataset.

library(foreign)
read.dta("~/Stats/Data/GSS/GSS7218_R1.dta") -> df.gss

There are multiple ways to do anything you want to do in R. Sometimes one way is genuinely better, and sometimes it’s purely a matter of aesthetics which one you choose. In this section, we’re going to look at some basic data carpentry using “vanilla” R, and then we will move into data carpentry with the tidyverse, a set of packages that work together extremely well to allow you to write flexible and readable R code.

Data Carpentry in Base R

So far, we’ve seen how to subset a dataset using square brackets. But the techniques of subsetting we’ve looked at haven’t been very useful. Often, you want to subset based on a variable value. For example, you only want to look at black respondents in a survey. Or you’re only interested in years after 1989 in some panel data. This can be done via square brackets fairly simply.

Let’s look at the CSPP data (since we deleted all of our objects, you’ll need to rerun the code to save the data). If you browse the data frame, you can see it begins in the year 1900. Let’s say we’re only interested in data after 1970. In that case, we’ll want to drop all observations (i.e. rows) in which the year is before 1970. To do that, we need to use logical elements.

Logical elements comparable to numeric or string elements, but they can only take two values: TRUE and FALSE. There are lots of expressions you can put into R and have it evaluate whether they are true or false. Three of the most common are >, <, and ==.

5>1
## [1] TRUE
1>5
## [1] FALSE
1==5
## [1] FALSE
5==5
## [1] TRUE

To subset our dataset to only include 1970 and later, we want to produce an expression that will be TRUE for for those dates, and false for any others. We’re also going to drop all the columns except

df.cspp[df.cspp$year>1969, c("year", "st", "unemployment")]
##      year st unemployment
## 3571 1970 AL           NA
## 3572 1970 AK           NA
## 3573 1970 AZ           NA
## 3574 1970 AR           NA
## 3575 1970 CA           NA
## 3576 1970 CO           NA
## 3577 1970 CT           NA
## 3578 1970 DE           NA
## 3579 1970 DC           NA
## 3580 1970 FL           NA
## 3581 1970 GA           NA
## 3582 1970 HI           NA
## 3583 1970 ID           NA
## 3584 1970 IL           NA
## 3585 1970 IN           NA
## 3586 1970 IA           NA
## 3587 1970 KS           NA
## 3588 1970 KY           NA
## 3589 1970 LA           NA
## 3590 1970 ME           NA
## 3591 1970 MD           NA
## 3592 1970 MA           NA
## 3593 1970 MI           NA
## 3594 1970 MN           NA
## 3595 1970 MS           NA
## 3596 1970 MO           NA
## 3597 1970 MT           NA
## 3598 1970 NE           NA
## 3599 1970 NV           NA
## 3600 1970 NH           NA
## 3601 1970 NJ           NA
## 3602 1970 NM           NA
## 3603 1970 NY           NA
## 3604 1970 NC           NA
## 3605 1970 ND           NA
## 3606 1970 OH           NA
## 3607 1970 OK           NA
## 3608 1970 OR           NA
## 3609 1970 PA           NA
## 3610 1970 RI           NA
## 3611 1970 SC           NA
## 3612 1970 SD           NA
## 3613 1970 TN           NA
## 3614 1970 TX           NA
## 3615 1970 UT           NA
## 3616 1970 VT           NA
## 3617 1970 VA           NA
## 3618 1970 WA           NA
## 3619 1970 WV           NA
## 3620 1970 WI           NA

Try running the expression we used to subset.

df.cspp$year>1969

The result is a vector of logical elements the same length as the vector df.cspp$year. For each element in that vector, R evaluates whether it is greater than 1969, and outputs either TRUE or FALSE depending on the answer. When this vector is placed inside the brackets, R goes through the rows of the dataset, and checks whether the vector’s corresponding row (1st row of vector compared with 1st row of dataset, 2nd with 2nd, etc) is either TRUE or FALSE. If it’s TRUE, R keeps it. If it’s FALSE, R drops it.

You can also use ==. == is different from =. You will forget this a lot, and use = when you should use ==. If we wanted to drop all observations except the year 1999, we do the following.

df.cspp[df.cspp$year==1999, c("year", "st", "unemployment")]
##      year st unemployment
## 5050 1999 AL          4.8
## 5051 1999 AK          6.4
## 5052 1999 AZ          4.4
## 5053 1999 AR          4.5
## 5054 1999 CA          5.2
## 5055 1999 CO          2.9
## 5056 1999 CT          3.2
## 5057 1999 DE          3.5
## 5058 1999 DC           NA
## 5059 1999 FL          3.9
## 5060 1999 GA          4.0
## 5061 1999 HI          5.6
## 5062 1999 ID          5.2
## 5063 1999 IL          4.3
## 5064 1999 IN          3.0
## 5065 1999 IA          2.5
## 5066 1999 KS          3.0
## 5067 1999 KY          4.5
## 5068 1999 LA          5.1
## 5069 1999 ME          4.1
## 5070 1999 MD          3.5
## 5071 1999 MA          3.2
## 5072 1999 MI          3.8
## 5073 1999 MN          2.8
## 5074 1999 MS          5.1
## 5075 1999 MO          3.4
## 5076 1999 MT          5.2
## 5077 1999 NE          2.9
## 5078 1999 NV          4.4
## 5079 1999 NH          2.7
## 5080 1999 NJ          4.6
## 5081 1999 NM          5.6
## 5082 1999 NY          5.2
## 5083 1999 NC          3.2
## 5084 1999 ND          3.4
## 5085 1999 OH          4.3
## 5086 1999 OK          3.4
## 5087 1999 OR          5.7
## 5088 1999 PA          4.4
## 5089 1999 RI          4.1
## 5090 1999 SC          4.5
## 5091 1999 SD          2.9
## 5092 1999 TN          4.0
## 5093 1999 TX          4.6
## 5094 1999 UT          3.7
## 5095 1999 VT          3.0
## 5096 1999 VA          2.8
## 5097 1999 WA          4.7
## 5098 1999 WV          6.6
## 5099 1999 WI          3.0
## 5100 1999 WY          4.9

The opposite of == is != (does not equal). If we wanted to dump Illinois from our dataset, to spite the flatlanders, we would use this code.

df.cspp[df.cspp$st != "IL", c("year", "st", "unemployment")]

We can also use the & operator to combine two conditions. R will then evaluate both of them, and return TRUE only if both of them are true. Let’s say we wanted to look at Wisconsin after 1980.

df.cspp[df.cspp$st=="WI" & df.cspp$year>1979, c("year", "st", "unemployment")]
##      year st unemployment
## 4130 1980 WI          7.0
## 4181 1981 WI          7.8
## 4232 1982 WI         10.7
## 4283 1983 WI         10.4
## 4334 1984 WI          7.3
## 4385 1985 WI          7.2
## 4436 1986 WI          7.0
## 4487 1987 WI          6.1
## 4538 1988 WI          4.2
## 4589 1989 WI          4.4
## 4640 1990 WI          4.4
## 4691 1991 WI          5.4
## 4742 1992 WI          5.1
## 4793 1993 WI          4.7
## 4844 1994 WI          4.7
## 4895 1995 WI           NA
## 4946 1996 WI          3.5
## 4997 1997 WI          3.7
## 5048 1998 WI          3.4
## 5099 1999 WI          3.0
## 5150 2000 WI          3.5
## 5201 2001 WI          4.6
## 5252 2002 WI          5.5
## 5303 2003 WI          5.6
## 5354 2004 WI          5.0
## 5405 2005 WI          4.7
## 5456 2006 WI          4.7
## 5507 2007 WI          4.9
## 5558 2008 WI          4.9
## 5609 2009 WI          8.6
## 5660 2010 WI          8.7
## 5711 2011 WI          7.8
## 5762 2012 WI          7.0
## 5813 2013 WI          6.7
## 5864 2014 WI          5.4
## 5915 2015 WI          4.6
## 5966 2016 WI          4.1
## 6017 2017 WI          3.3
## 6068 2018 WI           NA
## 6119 2019 WI           NA

Finally, there’s the “or” operator, | (that’s shift + the backslash key). If we wanted to look just at the years 1990 and 2000, we could use this operator. Note that | is used in combination with two other expressions that can be evaluated as TRUE or FALSE.

df.cspp[df.cspp$year==1990 | df.cspp$year==2000, c("year", "st", "unemployment")]
##      year st unemployment
## 4591 1990 AL          6.8
## 4592 1990 AK          6.9
## 4593 1990 AZ          5.3
## 4594 1990 AR          6.9
## 4595 1990 CA          5.6
## 4596 1990 CO          4.9
## 4597 1990 CT          5.1
## 4598 1990 DE          5.1
## 4599 1990 DC           NA
## 4600 1990 FL          5.9
## 4601 1990 GA          5.4
## 4602 1990 HI          2.8
## 4603 1990 ID          5.8
## 4604 1990 IL          6.2
## 4605 1990 IN          5.3
## 4606 1990 IA          4.2
## 4607 1990 KS          4.0
## 4608 1990 KY          5.8
## 4609 1990 LA          6.2
## 4610 1990 ME          5.1
## 4611 1990 MD          4.6
## 4612 1990 MA          6.0
## 4613 1990 MI          7.5
## 4614 1990 MN          4.8
## 4615 1990 MS          7.5
## 4616 1990 MO          5.7
## 4617 1990 MT          5.8
## 4618 1990 NE          2.2
## 4619 1990 NV          4.9
## 4620 1990 NH          5.6
## 4621 1990 NJ          5.0
## 4622 1990 NM          6.3
## 4623 1990 NY          5.2
## 4624 1990 NC          4.1
## 4625 1990 ND          3.9
## 4626 1990 OH          5.7
## 4627 1990 OK          5.6
## 4628 1990 OR          5.5
## 4629 1990 PA          5.4
## 4630 1990 RI          6.7
## 4631 1990 SC          4.7
## 4632 1990 SD          3.7
## 4633 1990 TN          5.2
## 4634 1990 TX          6.2
## 4635 1990 UT          4.3
## 4636 1990 VT          5.0
## 4637 1990 VA          4.3
## 4638 1990 WA          4.9
## 4639 1990 WV          8.3
## 4640 1990 WI          4.4
## 4641 1990 WY          5.4
## 5101 2000 AL          4.6
## 5102 2000 AK          6.6
## 5103 2000 AZ          3.9
## 5104 2000 AR          4.4
## 5105 2000 CA          4.9
## 5106 2000 CO          2.7
## 5107 2000 CT          2.3
## 5108 2000 DE          4.0
## 5109 2000 DC           NA
## 5110 2000 FL          3.6
## 5111 2000 GA          3.7
## 5112 2000 HI          4.3
## 5113 2000 ID          4.9
## 5114 2000 IL          4.4
## 5115 2000 IN          3.2
## 5116 2000 IA          2.6
## 5117 2000 KS          3.7
## 5118 2000 KY          4.1
## 5119 2000 LA          5.5
## 5120 2000 ME          3.5
## 5121 2000 MD          3.9
## 5122 2000 MA          2.6
## 5123 2000 MI          3.6
## 5124 2000 MN          3.3
## 5125 2000 MS          5.7
## 5126 2000 MO          3.5
## 5127 2000 MT          4.9
## 5128 2000 NE          3.0
## 5129 2000 NV          4.1
## 5130 2000 NH          2.8
## 5131 2000 NJ          3.8
## 5132 2000 NM          4.9
## 5133 2000 NY          4.6
## 5134 2000 NC          3.6
## 5135 2000 ND          3.0
## 5136 2000 OH          4.1
## 5137 2000 OK          3.0
## 5138 2000 OR          4.9
## 5139 2000 PA          4.2
## 5140 2000 RI          4.1
## 5141 2000 SC          3.9
## 5142 2000 SD          2.3
## 5143 2000 TN          3.9
## 5144 2000 TX          4.2
## 5145 2000 UT          3.2
## 5146 2000 VT          2.9
## 5147 2000 VA          2.2
## 5148 2000 WA          5.2
## 5149 2000 WV          5.5
## 5150 2000 WI          3.5
## 5151 2000 WY          3.9

Time for some exercises!

Exercise 1

What if we wanted every year in our dataset except the 1980s (Reaganism, hair metal, the rise of Jay Leno — there are lots of reasons to want to forget the 80s!). What code would do it?

Exercise 2

Let’s say we just wanted to see state-years in which unemployment was above 9%. What code would do it?

Exercise 3

What if wanted to see the 1990s, but not 1997?

Exercise 4

Let’s compare two states. Say, North Carolina and South Carolina. What code would subset the data to only those two states?

R the Right Way: The Tidyverse

As you can see, the code to subset based on more than one condition gets pretty ugly pretty fast. When you combine that with other operations (for example, let’s say you wanted to a vector of unemployment rates converted to decimal notation, which is done by dividing them by 100), things can get downright unreadable.

The tidyverse is a set of libraries designed to fix this problem. To install it, use:

install.packages("tidyverse")

Then load it using the library() function.

Using Pipes Like the Mario Bros

The most fundamental change the tidyverse introduces is the pipe operator. Pipes look like this: %>%. What they allow you to do is places functions in a line, instead of nested inside one another.

To make this concrete, let’s say we wanted to look at unemployment in Wisconsin. And let’s say we wanted to look at the change in the unemployment rate each year. We can use the diff() function for that, which takes a numeric vector as an argument, and calculates the difference from row to row. Finally, let’s say that we’re pretending to be economists, and so we always convert our variables to logarithmic scales to make ourselves feel smart. Since some of the changes are negative, we’ll need to make them all positive, which we can do by adding 10 to all of them.

In base R, that looks like this.

log(diff(df.cspp[df.cspp$st=="WI", "unemployment"])+10)
##   [1]       NA       NA       NA       NA       NA       NA       NA       NA
##   [9]       NA       NA       NA       NA       NA       NA       NA       NA
##  [17]       NA       NA       NA       NA       NA       NA       NA       NA
##  [25]       NA       NA       NA       NA       NA       NA       NA       NA
##  [33]       NA       NA       NA       NA       NA       NA       NA       NA
##  [41]       NA       NA       NA       NA       NA       NA       NA       NA
##  [49]       NA       NA       NA       NA       NA       NA       NA       NA
##  [57]       NA       NA       NA       NA       NA       NA       NA       NA
##  [65]       NA       NA       NA       NA       NA       NA       NA       NA
##  [73]       NA       NA       NA 2.163323 2.230014 2.322388 2.240710 2.525729
##  [81] 2.379546 2.557227 2.272126 1.931521 2.292535 2.282382 2.208274 2.091864
##  [89] 2.322388 2.302585 2.397895 2.272126 2.261763 2.302585       NA       NA
##  [97] 2.322388 2.272126 2.261763 2.351375 2.406945 2.388763 2.312535 2.240710
## [105] 2.272126 2.302585 2.322388 2.302585 2.617396 2.312535 2.208274 2.219203
## [113] 2.272126 2.163323 2.219203 2.251292 2.219203       NA       NA

It’s pretty hideous to read. What object are you actually starting with here? It takes a few seconds to even figure it out. This is where pipes come to the rescue.

df.cspp[df.cspp$st=="WI", "unemployment"] %>% #note: you can put lots of pipes on the same line, but it's often more readable not to
  diff() %>%
  + 10 %>%
  log()
##   [1]       NA       NA       NA       NA       NA       NA       NA       NA
##   [9]       NA       NA       NA       NA       NA       NA       NA       NA
##  [17]       NA       NA       NA       NA       NA       NA       NA       NA
##  [25]       NA       NA       NA       NA       NA       NA       NA       NA
##  [33]       NA       NA       NA       NA       NA       NA       NA       NA
##  [41]       NA       NA       NA       NA       NA       NA       NA       NA
##  [49]       NA       NA       NA       NA       NA       NA       NA       NA
##  [57]       NA       NA       NA       NA       NA       NA       NA       NA
##  [65]       NA       NA       NA       NA       NA       NA       NA       NA
##  [73]       NA       NA       NA 2.163323 2.230014 2.322388 2.240710 2.525729
##  [81] 2.379546 2.557227 2.272126 1.931521 2.292535 2.282382 2.208274 2.091864
##  [89] 2.322388 2.302585 2.397895 2.272126 2.261763 2.302585       NA       NA
##  [97] 2.322388 2.272126 2.261763 2.351375 2.406945 2.388763 2.312535 2.240710
## [105] 2.272126 2.302585 2.322388 2.302585 2.617396 2.312535 2.208274 2.219203
## [113] 2.272126 2.163323 2.219203 2.251292 2.219203       NA       NA

Note that these two codes produce the exact same output. But the second chunk of code is much easier to read, because you can easily separate out every step of what you’re doing, instead of trying to find the innermost function and work backwards from there. From here on out, I’ll be using pipes for basically everything.

Subsetting and Selecting

The tidyverse doesn’t just give us a nicer way to write our code here. It also gives us functions that allow us to subset a lot more effectively. The two key functions are select() and filter().

Select is simpler. It simply allows you to select which columns from a dataframe you’d like to keep. We’ve been using some rather clunky base R nomenclature to keep just the state, year, and unemployment columns from df.cspp. Using select, it’s a lot easier to read.

df.cspp %>% select(year,
                   st,
                   unemployment)
##    year st unemployment
## 1  1900 AL           NA
## 2  1900 AK           NA
## 3  1900 AZ           NA
## 4  1900 AR           NA
## 5  1900 CA           NA
## 6  1900 CO           NA
## 7  1900 CT           NA
## 8  1900 DE           NA
## 9  1900 DC           NA
## 10 1900 FL           NA
## 11 1900 GA           NA
## 12 1900 HI           NA
## 13 1900 ID           NA
## 14 1900 IL           NA
## 15 1900 IN           NA
## 16 1900 IA           NA
## 17 1900 KS           NA
## 18 1900 KY           NA
## 19 1900 LA           NA
## 20 1900 ME           NA
## 21 1900 MD           NA
## 22 1900 MA           NA
## 23 1900 MI           NA
## 24 1900 MN           NA
## 25 1900 MS           NA
## 26 1900 MO           NA
## 27 1900 MT           NA
## 28 1900 NE           NA
## 29 1900 NV           NA
## 30 1900 NH           NA
## 31 1900 NJ           NA
## 32 1900 NM           NA
## 33 1900 NY           NA
## 34 1900 NC           NA
## 35 1900 ND           NA
## 36 1900 OH           NA
## 37 1900 OK           NA
## 38 1900 OR           NA
## 39 1900 PA           NA
## 40 1900 RI           NA
## 41 1900 SC           NA
## 42 1900 SD           NA
## 43 1900 TN           NA
## 44 1900 TX           NA
## 45 1900 UT           NA
## 46 1900 VT           NA
## 47 1900 VA           NA
## 48 1900 WA           NA
## 49 1900 WV           NA
## 50 1900 WI           NA

You can also use select to rename variables. This is often necessary, because a lot of datasets use totally non-descriptive names like VCF01094 for their variables. In this case, let’s say we wanted to change “st” to “state.”

df.cspp %>% select(year,
                   state=st,
                   unemployment)

When renaming, the new name goes before the equal sign, and the old name after. Note that in select(), unlike when using square brackets, you don’t need to tell R that it’s using strings. Since select() is only used for selecting column names, R just assumes that whatever you’re entering into it is a string.

The filter() function then allows us to also subset based on variable values. Its logic is fundamentally the same as square brackets. It evaluates statements as TRUE or FALSE and then drops rows for which the condition(s) are false.

df.cspp %>% select(year,
                   state=st,
                   unemployment) %>%
  filter(year==1999)
##    year state unemployment
## 1  1999    AL          4.8
## 2  1999    AK          6.4
## 3  1999    AZ          4.4
## 4  1999    AR          4.5
## 5  1999    CA          5.2
## 6  1999    CO          2.9
## 7  1999    CT          3.2
## 8  1999    DE          3.5
## 9  1999    DC           NA
## 10 1999    FL          3.9
## 11 1999    GA          4.0
## 12 1999    HI          5.6
## 13 1999    ID          5.2
## 14 1999    IL          4.3
## 15 1999    IN          3.0
## 16 1999    IA          2.5
## 17 1999    KS          3.0
## 18 1999    KY          4.5
## 19 1999    LA          5.1
## 20 1999    ME          4.1
## 21 1999    MD          3.5
## 22 1999    MA          3.2
## 23 1999    MI          3.8
## 24 1999    MN          2.8
## 25 1999    MS          5.1
## 26 1999    MO          3.4
## 27 1999    MT          5.2
## 28 1999    NE          2.9
## 29 1999    NV          4.4
## 30 1999    NH          2.7
## 31 1999    NJ          4.6
## 32 1999    NM          5.6
## 33 1999    NY          5.2
## 34 1999    NC          3.2
## 35 1999    ND          3.4
## 36 1999    OH          4.3
## 37 1999    OK          3.4
## 38 1999    OR          5.7
## 39 1999    PA          4.4
## 40 1999    RI          4.1
## 41 1999    SC          4.5
## 42 1999    SD          2.9
## 43 1999    TN          4.0
## 44 1999    TX          4.6
## 45 1999    UT          3.7
## 46 1999    VT          3.0
## 47 1999    VA          2.8
## 48 1999    WA          4.7
## 49 1999    WV          6.6
## 50 1999    WI          3.0
## 51 1999    WY          4.9

If you want to subset based on multiple conditions with filter(), just separate them with a comma. Let’s say we want to see state-years after 1980 where unemployment was below 5%.

df.cspp %>% select(year,
                   state=st,
                   unemployment) %>%
  filter(year>1980,
         unemployment<5)

Now let’s do the same exercises we did in base R using the tidyverse.

Exercise 1

What if we wanted every year in our dataset except the 1980s (Reaganism, hair metal, the rise of Jay Leno — there are lots of reasons to want to forget the 80s!). What code would do it?

Exercise 2

Let’s say we just wanted to see state-years in which unemployment was above 9%. What code would do it?

Exercise 3

What if wanted to see the 1990s, but not 1997?

Exercise 4

Let’s compare two states. Say, North Carolina and South Carolina. What code would subset the data to only those two states?

Transforming Data

Subsetting data is a very important skill, and you’ll probably use either filtering on a variable value or selecting only certain variables to work with on every data set you ever analyze. Most of all you’ll need to know is contained above.

You’ll spend much more time transforming data. Data can be transformed in lots of ways. Variables can be recoded. You can take a numerical variable, like income, and recode it into buckets (low income, middle income, high income), making it categorical. You can generate new variables based on the values of several existing variables (is a survey respondent white and making an income in the bottom third of the income distribution? Make a new variable (white working class) that gives those people a 1, and everyone else a 0). You can pivot data around a variable value (for example, if you have individual level survey data with a geographic residence variable (like state), you can transform your data into state-level summaries of other variables). You can also make cross-tabs, which display the distribution of two variables values across each other. We’re only going to be able to skim the surface of data transformation here, but we’ll cover the fundamentals.

The Trouble Solved by Tibbles

One place to begin is is with the tibble. A tibble is kind of dataframe specific to the tidyverse. Pretty much any other dataframe can be made into a tibble. Here is mtcars as a tibble, via the function as_tibble().

mtcars %>%
  as_tibble()
## # A tibble: 32 x 11
##      mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
##    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
##  1  21       6  160    110  3.9   2.62  16.5     0     1     4     4
##  2  21       6  160    110  3.9   2.88  17.0     0     1     4     4
##  3  22.8     4  108     93  3.85  2.32  18.6     1     1     4     1
##  4  21.4     6  258    110  3.08  3.22  19.4     1     0     3     1
##  5  18.7     8  360    175  3.15  3.44  17.0     0     0     3     2
##  6  18.1     6  225    105  2.76  3.46  20.2     1     0     3     1
##  7  14.3     8  360    245  3.21  3.57  15.8     0     0     3     4
##  8  24.4     4  147.    62  3.69  3.19  20       1     0     4     2
##  9  22.8     4  141.    95  3.92  3.15  22.9     1     0     4     2
## 10  19.2     6  168.   123  3.92  3.44  18.3     1     0     4     4
## # … with 22 more rows

As you can see, tibbles have two very nice properties. One, when you receive them as output in the console, they only print the first ten rows, and they only print enough columns to fill your console screen horizontally. That way, you’re not looking at 500 rows of data where you can only see some column names some of the time. Second, they have little labels in pointy brackets telling you what class your variables are.

There are a good number of variable classes in R. The four basic ones are numeric (which can be doubles, having decimals, and integers, which don’t), character (which are strings), logical (TRUE or FALSE), and factors. Factors are the most complicated. A factor is a variable which R is treating as a categorical variable, with defined levels. For example, in the GSS, the race variable is coded is “black,” “white,” and “other.” If that were imported into R as a factor in a dataset, you couldn’t just add “asian america” or something like that to it, which you could if it were a string. Instead, you’d have to tell R that you’re adding a new level to this factor. A lot of very minor headaches in R come from forgetting whether a variable is currently coded as a character or a factor variable. This is one reason tibbles are so nice. They remind you!

From here out, I’m going to convert all dataframes that aren’t tibbles into tibbles. Again, the nice thing about the tidyverse is that it’s very easy to just throw this is in a pipe!

Tabling

Very often, we want to summarize a lot of data in a table. The simplest form of this is summarizing a vector. The table() function does this. To play with this function, let’s switch over to the GSS.

Examine it in the terminal by entering the object’s name (I saved mine as df.gss). As you can see from the summary in the upper right, df.gss has about 64,000 observations and over 6,000 variables. We are definitely going to want to select only the variables we are interested in if we are analyzing GSS data!

Let’s start by turning our GSS object into a tibble.

df.gss %>%
  as_tibble() -> df.gss

Picking a variable more or less at random, let’s say we wanted to look at a summary of people’s answer to the religion question. The basic GSS religion variable is relig. The table() function will summarize it for us.

df.gss$relig %>% table()
## .
##              protestant                catholic                  jewish 
##                   37117                   15674                    1285 
##                    none                   other                buddhism 
##                    7797                    1086                     198 
##                hinduism           OTHER EASTERN            MOSLEM/ISLAM 
##                     100                      39                     153 
##      ORTHODOX-CHRISTIAN               christian         NATIVE AMERICAN 
##                     118                     791                      31 
## INTER-NONDENOMINATIONAL                      DK                     IAP 
##                     136                       0                       0 
##                      NA 
##                       0

Lots of Protestants and Catholics, and a sprinkling of others. Since these are survey data, raw counts like this aren’t terribly informative. Percentages are what we really want (we are going to ignore issues of survey weighting in this tutorial, since it’s a bit more of a substantive statistical topic). This is what the prop.table() function is for. It takes a table, and renders it as percentages.

df.gss$relig %>% table() %>% prop.table()
## .
##              protestant                catholic                  jewish 
##            0.5752344053            0.2429135994            0.0199147617 
##                    none                   other                buddhism 
##            0.1208368849            0.0168306858            0.0030685781 
##                hinduism           OTHER EASTERN            MOSLEM/ISLAM 
##            0.0015497869            0.0006044169            0.0023711740 
##      ORTHODOX-CHRISTIAN               christian         NATIVE AMERICAN 
##            0.0018287485            0.0122588144            0.0004804339 
## INTER-NONDENOMINATIONAL                      DK                     IAP 
##            0.0021077102            0.0000000000            0.0000000000 
##                      NA 
##            0.0000000000

The GSS as a whole runs from 1972 to 2018, so knowing the distribution of religions across that entire period isn’t terribly informative. It’s probably more useful to get a table for a subset of years, or even a single wave of the survey.

df.gss %>%
  filter(year==2000) %>%
  select(relig) %>%
  table() %>%
  prop.table()
## .
##              protestant                catholic                  jewish 
##            0.5407038749            0.2413793103            0.0223960185 
##                    none                   other                buddhism 
##            0.1414859581            0.0149306790            0.0060433701 
##                hinduism           OTHER EASTERN            MOSLEM/ISLAM 
##            0.0028439389            0.0003554924            0.0042659083 
##      ORTHODOX-CHRISTIAN               christian         NATIVE AMERICAN 
##            0.0042659083            0.0138642019            0.0014219694 
## INTER-NONDENOMINATIONAL                      DK                     IAP 
##            0.0060433701            0.0000000000            0.0000000000 
##                      NA 
##            0.0000000000

Note that in this example I started with the dataset as a whole, rather than starting with the single vector as I did in the previous code chunk. This is because I need to have the year variable in order to filter based on it.

What about a table of two variables? This is called a crosstab. The table() function can also be used to make one. In this case, let’s look at the relationship between marital status and race. Because table() needs to take two arguments here, it might be easiest to do things without a pipe.

table(df.gss$race, df.gss$marital)
##        
##         married widowed divorced separated NEVER MARRIED    NA
##   white   29313    5088     6682      1297          9636     0
##   black    3097     953     1285       771          3073     0
##   other    1719     159      412       174          1128     0
##   IAP         0       0        0         0             0     0

The disadvantage of this simple line of code is that it’s not terribly easy to do it for only a single year. To do that, it’s simplest to store your subsetted dataset as a new object. Let’s also look at proportions here, since again, raw counts aren’t terribly useful in a survey.

df.gss %>%
  filter(year==2000) -> df.gss.2000

table(df.gss.2000$race, df.gss.2000$marital) %>% prop.table()
##        
##             married     widowed    divorced   separated NEVER MARRIED
##   white 0.383167614 0.077059659 0.128196023 0.021661932   0.175781250
##   black 0.042968750 0.017045455 0.021306818 0.015269886   0.055752841
##   other 0.027698864 0.002840909 0.007102273 0.002840909   0.021306818
##   IAP   0.000000000 0.000000000 0.000000000 0.000000000   0.000000000
##        
##                  NA
##   white 0.000000000
##   black 0.000000000
##   other 0.000000000
##   IAP   0.000000000

Note the output from this isn’t immediately easy to interpret. To read a table like this, you need to know whether the first row, first column cell tells us the percentage of whites who are married, or the percentage of married people who are white. The default in prop.table() is to tell us neither; instead, it tells us the total share of observations represented by each cell. For example, 38% of respondents are white and married, while about 5% are black and never married.

To display marginal shares instead, we need to tell prop.table() we want to calculate based on row margins. Take a look at the help file for prop.table().

?prop.table()

We can see that prop.table() takes two arguments: the table generated by table(), and margin. To calculate row percentages (ie the percent of whites who are married), we enter a 1. Note: Like a lot of functions, prop.table() has essential arguments (the table) and optional arguments (margin). In a pipe, R will always try to send the pipe output into the essential argument. If we want to add an option argument, we just add it without specifying the essential argument.

table(df.gss.2000$race, df.gss.2000$marital) %>% prop.table(1)
##        
##            married    widowed   divorced  separated NEVER MARRIED         NA
##   white 0.48757343 0.09805694 0.16312698 0.02756439    0.22367826 0.00000000
##   black 0.28205128 0.11188811 0.13986014 0.10023310    0.36596737 0.00000000
##   other 0.44827586 0.04597701 0.11494253 0.04597701    0.34482759 0.00000000
##   IAP

If we wanted column marginals (percentage of married people who are white), we would simply enter 2 instead of 1.

Exercise 1

How has the distribution of educational attainment changed in the US from the year 1990 to the year 2010? Use the GSS variable “degree” to answer this question. (Hint: make two tables and compare them)

Exercise 2

How are religious fundamentalism and confidence in American business related since 2010? Use the GSS variables “conbus” and “fund” to answer.

Exercise 3

Do people who were born outside of the US place a different value on hard work from people who weren’t? Answer using data from 2018, and the variables “born” and “workhard”.

Pivot Tables

So far, we’ve been looking at data transformations that summarize and compare chunks of existing data. We can also make pivot tables, which are tables that pivot data around certain variables. For example, in our df.cspp dataset, we have unemployment, state, and year data. We can pivot that data on the year variable to find average unemployment for the country as a whole. In the tidyverse, the group_by() and summarize() functions are used for this.

df.cspp %>%
  as_tibble() %>%
  group_by(year) %>%
  filter(is.na(unemployment)==FALSE) %>%
  summarise(avg_unemployment = weighted.mean(unemployment,
                                             poptotal,
                                             na.rm=TRUE))
## `summarise()` ungrouping output (override with `.groups` argument)
## # A tibble: 42 x 2
##     year avg_unemployment
##    <int>            <dbl>
##  1  1975             8.50
##  2  1976             7.66
##  3  1977             6.96
##  4  1978             6.02
##  5  1979             5.80
##  6  1980             7.07
##  7  1981             7.66
##  8  1982             9.70
##  9  1983             9.66
## 10  1984             7.63
## # … with 32 more rows

group_by() is an incredibly useful function, and one we’ll be returning to shortly when we talk about recoding variables. It splits your data up into groups, and then does whatever operation you’re telling R to do on each group. In this case, we’ve told R that each year is a distinct group. Then, the summarise() function tells R to collapse each group into a single observation using a function. Here, we’ve told R to find the weighted mean (we need to use a weighted mean here because the unemployment rate in New York will have a bigger impact on the average than the unemployment rate in Wisconsin) of unemployment in each group. The na.rm=TRUE tells R to throw out observations with missing data. Some functions do this automatically, others need to be told how to handle it. As always, use the help function ( ?weighted.mean() ) to figure out what’s necessary.

You can also group based on more than one variable. In that case, groups will be formed by the discrete combinations of values of the variables. For example, if you grouped based on a race variable that was black, white, and other, and on a education variable that was college-educated and non-college educated, you would have six groups (3 X 2).

We can use this to look at race and gender in voting in 2016.

df.gss %>%
  filter(year==2018) %>%
  group_by(race, sex, PRES16) %>%
  tally()
## # A tibble: 40 x 4
## # Groups:   race, sex [6]
##    race  sex    PRES16                        n
##    <fct> <fct>  <fct>                     <int>
##  1 white male   iap                         217
##  2 white male   Clinton                     188
##  3 white male   Trump                       290
##  4 white male   Other candidate (specify)    42
##  5 white male   Didn't vote for president     9
##  6 white male   Don't know                    8
##  7 white male   No answer                    15
##  8 white female iap                         314
##  9 white female Clinton                     278
## 10 white female Trump                       257
## # … with 30 more rows

The tally() function is similar to summarise(), except simpler. It just counts how many observations are in each group that you’ve indicated. Once again, since this is a survey, raw counts aren’t very useful. But using the mutate() function, which we will cover in more depth soon, you can easily turn this into percentages.

df.gss %>%
  filter(year==2018) %>%
  group_by(race, sex, PRES16) %>%
  tally() %>%
  mutate(vote.share = prop.table(n))
## # A tibble: 40 x 5
## # Groups:   race, sex [6]
##    race  sex    PRES16                        n vote.share
##    <fct> <fct>  <fct>                     <int>      <dbl>
##  1 white male   iap                         217     0.282 
##  2 white male   Clinton                     188     0.244 
##  3 white male   Trump                       290     0.377 
##  4 white male   Other candidate (specify)    42     0.0546
##  5 white male   Didn't vote for president     9     0.0117
##  6 white male   Don't know                    8     0.0104
##  7 white male   No answer                    15     0.0195
##  8 white female iap                         314     0.340 
##  9 white female Clinton                     278     0.301 
## 10 white female Trump                       257     0.278 
## # … with 30 more rows
Exercise 1

As you can see, these answers include people who didn’t vote and people for whom the question is inapplicable. How would you alter this code so that your data only included people who answered Clinton or Trump?

Exercise 2

In the CSPP, the variable “right2work” tracks the passage of right to work laws. Make a pivot table that shows us the number of states that do and don’t have right to work laws in each year. (Hint: this is similar to counting how many people voted for each presidential option).

Exercise 3

What are the average personal incomes (“realrinc”) by race (“race”) for each year in the GSS?

Recoding Data

Our data often fail us. The thing we want is measured wrong, or needs to be constructed from other measures. This is where recoding data comes in. Of all the things you can do to data that we’ve talked about today, you’ll probably spend the most time recoding.

In the tidyverse, the fundamental recoding function is mutate(). mutate() can be used either to add a new variable or transform an existing variable. For example, in the GSS, if we wanted to take the log of household income (“realinc”), we would use mutate as follows.

df.gss %>%
  mutate(loginc = log(realinc))
## # A tibble: 64,814 x 6,109
##     year    id wrkstat  HRS1  HRS2 evwork   occ prestige wrkslf wrkgovt commute
##    <int> <int> <fct>   <int> <int> <fct>  <int>    <int> <fct>  <fct>     <int>
##  1  1972     1 WORKIN…    NA    NA <NA>     205       50 SOMEO… <NA>         NA
##  2  1972     2 retired    NA    NA yes      441       45 SOMEO… <NA>         NA
##  3  1972     3 WORKIN…    NA    NA <NA>     270       44 SOMEO… <NA>         NA
##  4  1972     4 WORKIN…    NA    NA <NA>       1       57 SOMEO… <NA>         NA
##  5  1972     5 KEEPIN…    NA    NA yes      385       40 SOMEO… <NA>         NA
##  6  1972     6 WORKIN…    NA    NA <NA>     281       49 SOMEO… <NA>         NA
##  7  1972     7 WORKIN…    NA    NA <NA>     522       41 SOMEO… <NA>         NA
##  8  1972     8 WORKIN…    NA    NA <NA>     314       36 SOMEO… <NA>         NA
##  9  1972     9 WORKIN…    NA    NA <NA>     912       26 SOMEO… <NA>         NA
## 10  1972    10 WORKIN…    NA    NA <NA>     984       18 SOMEO… <NA>         NA
## # … with 64,804 more rows, and 6,098 more variables: industry <int>,
## #   OCC80 <fct>, PRESTG80 <int>, INDUS80 <int>, INDUS07 <int>, occonet <dbl>,
## #   found <fct>, OCC10 <int>, occindv <fct>, occstatus <fct>, occtag <fct>,
## #   PRESTG10 <int>, PRESTG105PLUS <int>, INDUS10 <int>, indstatus <fct>,
## #   indtag <fct>, marital <fct>, martype <fct>, agewed <int>, divorce <fct>,
## #   widowed <fct>, spwrksta <fct>, SPHRS1 <int>, SPHRS2 <int>, spevwork <fct>,
## #   cowrksta <fct>, cowrkslf <fct>, coevwork <fct>, COHRS1 <int>, COHRS2 <int>,
## #   spocc <int>, sppres <int>, spwrkslf <fct>, spind <int>, SPOCC80 <int>,
## #   SPPRES80 <int>, SPIND80 <int>, SPOCC10 <int>, spoccindv <fct>,
## #   spoccstatus <fct>, spocctag <fct>, SPPRES10 <int>, SPPRES105PLUS <int>,
## #   SPIND10 <fct>, spindstatus <fct>, spindtag <fct>, COOCC10 <int>,
## #   COIND10 <fct>, PAOCC16 <int>, PAPRES16 <int>, pawrkslf <fct>,
## #   PAIND16 <int>, PAOCC80 <fct>, PAPRES80 <int>, PAIND80 <fct>, PAOCC10 <int>,
## #   paoccindv <fct>, paoccstatus <fct>, paocctag <fct>, PAPRES10 <int>,
## #   PAPRES105PLUS <int>, PAIND10 <int>, paindstatus <fct>, paindtag <fct>,
## #   MAOCC80 <fct>, MAPRES80 <int>, mawrkslf <fct>, MAIND80 <fct>,
## #   MAOCC10 <int>, maoccindv <fct>, maoccstatus <fct>, maocctag <fct>,
## #   MAPRES10 <int>, MAPRES105PLUS <int>, MAIND10 <fct>, maindstatus <fct>,
## #   maindtag <fct>, sibs <int>, childs <int>, age <int>, agekdbrn <int>,
## #   educ <int>, paeduc <int>, maeduc <int>, speduc <int>, coeduc <int>,
## #   codeg <fct>, degree <fct>, padeg <fct>, madeg <fct>, spdeg <fct>,
## #   MAJOR1 <fct>, MAJOR2 <fct>, dipged <fct>, spdipged <fct>, codipged <fct>,
## #   cosector <fct>, whenhs <int>, whencol <int>, sector <fct>, …

We can compare our new variable with our old using select to only view those columns.

df.gss %>%
  mutate(loginc = log(realinc)) %>%
  select(realinc, loginc)
## # A tibble: 64,814 x 2
##    realinc loginc
##      <dbl>  <dbl>
##  1   18951   9.85
##  2   24366  10.1 
##  3   24366  10.1 
##  4   30458  10.3 
##  5   50763  10.8 
##  6   43994  10.7 
##  7   37226  10.5 
##  8   13537   9.51
##  9    2707   7.90
## 10   18951   9.85
## # … with 64,804 more rows

Here, we tell R to create a variable called loginc, and define it as the log of the variable realinc. Note that in doing so, we haven’t actually changed the object df.gss. To do that, we need to add a store instruction.

df.gss %>%
  mutate(loginc = log(realinc)) -> df.gss

We can also use mutate to combine two variables. For example, we can divide household income by the square root of the number of household members (“hompop”), which is the Census Bureau’s recommended method of adjusting household income for household size.

df.gss %>%
  mutate(adj.realinc = realinc/sqrt(hompop)) -> df.gss

These are some pretty simple mathematical transformations of variables. R can do things that are much more interesting. For example, we can turn a continuous variable, like income, into a categorical variable that is easier to work with in some cases, like tables.

Do something here on transforming a continuous variable to categorical for use in tables. Essentially, this means taking something continuous, like income, and chopping it up into a few buckets. In the tidyverse, the ntile() function does this extremely nicely. Let’s say we want a variable of income quintiles.

df.gss %>%
  group_by(year) %>% #quintiles should be calculated with reference to the income distribution from the year of the survey
  mutate(realrinc.quintiles = ntile(realrinc, 5)) -> df.gss

Since we stored this transformation, we can now make some nice tables using income, which we couldn’t do before.

table(df.gss$degree, df.gss$realrinc.quintiles) %>% prop.table(1)
##                 
##                           1          2          3          4          5
##   LT HIGH SCHOOL 0.33639083 0.27258225 0.18683948 0.12801595 0.07617149
##   HIGH SCHOOL    0.22293596 0.23374166 0.22268698 0.18548949 0.13514590
##   JUNIOR COLLEGE 0.14003795 0.20417457 0.23377609 0.24060721 0.18140417
##   bachelor       0.12097137 0.11317644 0.17928347 0.25828212 0.32828661
##   graduate       0.06883532 0.06476910 0.10078420 0.24339239 0.52221900
##   dk                                                                   
##   iap                                                                  
##   na

A lot of recoding that you’ll want to do is based on “if then” type statements. For example, if someone is white and their income is in the bottom third of the distribution, we’d like to label them as white working class. Or if someone has said they are “very conservative” or “somewhat conservative,” we’d like to simplify that into a variable where either answer marks someone as “conservative.” For these situations, we want case_when().

case_when() typically gets inserted inside of a mutate() command. It has a basic two part structure. The first part is an expression that can be evaluated as TRUE or FALSE. If the statement is TRUE for a given row, then R will throw in the second part as a new value.

For example, here we do a very simple recode that just gives us a 1 if someone makes above the median income, and a 0 if they don’t.

df.gss %>%
  group_by(year) %>%
  filter(is.na(realrinc)==FALSE) %>% #This line drops all observations where realrinc is NA
  mutate(abov.med = case_when(realrinc > median(realrinc) ~ 1,
                              realrinc <= median(realrinc) ~ 0)) %>%
  select(realrinc, abov.med)
## Adding missing grouping variables: `year`
## # A tibble: 37,887 x 3
## # Groups:   year [30]
##     year realrinc abov.med
##    <int>    <dbl>    <dbl>
##  1  1974     4935        0
##  2  1974    43178        1
##  3  1974    18505        0
##  4  1974    22206        1
##  5  1974    55515        1
##  6  1974     4935        0
##  7  1974     4935        0
##  8  1974    18505        0
##  9  1974    11103        0
## 10  1974    30841        1
## # … with 37,877 more rows

The tilde (“~”) separates the two arguments here. realrinc > median(realrinc) can be evaluated as TRUE or FALSE. If it’s TRUE, R puts a 1 in the new variable abov.med. If it’s false, R goes down to the next condition in your case_when() call, and checks if it’s true or false. Since here our conditions are cumulatively exhaustive, we only need two. Because of this fact, when you’re writing a case_when() call, you never actually need to specify your last condition. If everything not covered by one of your preceding conditions falls into the same category, you can just throw a TRUE in for your last one, and everything that’s left will get the value you associate with it.

df.gss %>%
  group_by(year) %>%
   filter(is.na(realrinc)==FALSE) %>%
  mutate(abov.med = case_when(realrinc > median(realrinc) ~ 1,
                              TRUE ~ 0)) %>%
  select(realrinc, abov.med)
## Adding missing grouping variables: `year`
## # A tibble: 37,887 x 3
## # Groups:   year [30]
##     year realrinc abov.med
##    <int>    <dbl>    <dbl>
##  1  1974     4935        0
##  2  1974    43178        1
##  3  1974    18505        0
##  4  1974    22206        1
##  5  1974    55515        1
##  6  1974     4935        0
##  7  1974     4935        0
##  8  1974    18505        0
##  9  1974    11103        0
## 10  1974    30841        1
## # … with 37,877 more rows

Here’s how you’d use case_when() to code people as white working class or not.

df.gss %>%
  group_by(year) %>%
   filter(is.na(realrinc)==FALSE) %>%
  mutate(inctile = ntile(realinc,3)) %>% #income into terciles by year
  mutate(wwc = case_when(race=="white" & inctile==1 ~ 1, #White and lowest tercile
                         TRUE ~ 0)) %>%
  select(year, inctile, race, wwc)
## # A tibble: 37,887 x 4
## # Groups:   year [30]
##     year inctile race    wwc
##    <int>   <int> <fct> <dbl>
##  1  1974       1 white     1
##  2  1974       2 white     0
##  3  1974       1 white     1
##  4  1974       2 white     0
##  5  1974       3 white     0
##  6  1974       1 white     1
##  7  1974       3 black     0
##  8  1974       2 white     0
##  9  1974       1 black     0
## 10  1974      NA white     0
## # … with 37,877 more rows
Exercise 1

Use case_when() to create a new variable that tells us whether someone’s height (“height”) is greater than average. (Bonus: use group_by() to make this comparison year and sex specific.)

Exercise 2

The GSS has an unnecessarily complicated political party identification scheme. Take a look at it:

df.gss$partyid %>% table()
## .
##    STRONG DEMOCRAT   NOT STR DEMOCRAT       IND,NEAR DEM        independent 
##              10378              13294               7792               9888 
##       IND,NEAR REP NOT STR REPUBLICAN  STRONG REPUBLICAN        OTHER PARTY 
##               5721               9933               6318               1072 
##                 DK                 NA 
##                  0                  0

Why not simplify it into three categories: liberal, conservative, and independent?

Exercise 3

The GSS asks a number of questions about national spending priorities. Let’s look at two, “natheal”, which looks at healthcare, and “nateduc”, which looks at education. Let’s say we wanted to group respondents into three categories: cutters, who want to reduce spending on both; spenders, who want to increase spending on both; and midroaders, who want something in between. How would you use case_when() to create a new variable, spending.attitude, with these categories?

Merging Data

Sometimes we want variables from multiple datasets. This is one area R has a huge advantage over programs like STATA or SPSS, because it’s very simple in R to work with multiple datasets at once.

The function in the tidyverse for merging datsets is inner_join(). Inner join takes three arguments: your first dataset, your second dataset, and then the columns that specify unique observations. For example, in our CSPP dataset, the combination of state and year is sufficient to identify a unique row. So if we were merging another dataset with the full CSPP dataset, it would also need to have a variable called state and a variable called year. Merging those two datasets would look something like this.

inner_join(df.cspp, second.df, by=c("state", "year"))

To illustrate how this works, I’ve prepared two very simple and processed datasets that can be easily merged. One comes from CSPP, and has states, years, and right to work status. The other comes from the American National Elections Survey, and has states, years, and respondents’ “feeling thermometer” for big business. You can download them both from my github here and here. Just use ctrl + s (or cmd + s if you’re on mac) to save the files, and then load them in R and assign them names using read.csv.

Once you’ve done that, merge them, and save the resulting dataset.

inner_join(name1, name2, by=c("state", "year")) -> df.merged

Examine the resulting dataset. As you can see, it has about 34,000 rows. This is because effectively what we have now are the ANES data, but with an extra variable added that isn’t in that survey: whether the respondent lived in a state that was right to work in the year of the study.

From here, it’s very simple to turn this into a pivot table that shows us the average feeling thermometer for respondents in right to work and non-right to work states by year.

df.merged %>%
  group_by(year, right2work) %>%
  summarise(avg.temp = mean(big.bus))

Note: all the real work of making this table consisted of processing the datasets to make them easy to merge. You can see the R script I used to do that, and download it and run it on your computer, here.

Merging datasets will always be a pain. Two variables that measure the same thing will be named different things, or will have a slight difference in their coding that you have to identify and fix before you can merge them. You should never expect it to work the first time you try to merge two datasets.

Data Visualization

Words divide, pictures unite. - Otto Neurath

Data visualization is the most effective way to present data. A table of numbers will always take your reader more effort to decipher than a graphical plot will. A lot of the tables that we made in the previous section would be much better as plots. So let’s plot some data.

The best way to make plots in R is with the library ggplot2, which is part of the tidyverse. Since you already installed the tidyverse, you already have ggplot2.

ggplot2 is based around two fundamental concepts aesthetics and geometries. Aesthetics are variables in your dataset mapped to concepts in your plot. For example, if you place a variable like education on the x axis, and a variable like income on your y axis, each of those is an aesthetic. There are lots of other kinds of aesthetics besides x and y axis that can be employed, but for now, let’s stick with those two.

Geometries are how your data are represented. Line graph, bar graph, heat map, box plot, histogram — all of these are geometries in ggplot2.

Here’s a simple example of what it looks like, using mtcars.

mtcars %>%
  ggplot(aes(x=wt, y=mpg)) +
  geom_point()

As you can see, the two parts of ggplot are contained in two different functions. First, we have the ggplot() function. Inside this function, we place the aes() function, whose arguments are the different aesthetics we want to map. Then, our geometry function, geom_point, is added with a “+” sign. Note that it is NOT a pipe. Once you feed your data into ggplot, subsequent ggplot functions to change your graph are added with “+”, not “%>%”.

We can add another aesthetic to this plot easily. Let’s say we want to also be able to see how the number of cylinders fits in to this relationship. One way to do that is by having the points be different colors based on the number of cylinders.

mtcars %>%
  ggplot(aes(x=wt, y=mpg, color=as.factor(cyl))) + #Turn a continuous variable into a categorical one.
  geom_point()

Note that I transformed cyl, a numeric variable, into a categorical one using the function as.factor(). This didn’t change the dataset at all. Instead, it told R to transform the data in the process of plotting it. Color can also be used to represent numeric variables, but in this case, since cyl is a variable with three levels, it made more sense to tell R to treat it as categorical, rather than as numeric.

Another aesthetic that can be used is size. This time, let’s use it to look at a continuous variable — horsepower.

mtcars %>%
  ggplot(aes(x=wt, y=mpg, size=hp)) + 
  geom_point()

These plots aren’t very pretty. The default grey ggplot2 backdrop is ugly (IMO), and the axis labels aren’t very informative to anyone who doesn’t know the dataset. There are two functions I use a lot to clean up plots: labs() and theme_bw(). labs() adds labels, and theme_bw() offers a more modern looking backdrop.

mtcars %>%
  ggplot(aes(x=wt, y=mpg, size=hp)) + 
  geom_point() +
  labs(title="Weight, Mileage, and Horsepower",
       subtitle = "You can put a subtitle here",
       x = "Weight",
       y = "Miles Per Gallon",
       size = "Horsepower",
       caption = "You can put a caption here") +
  theme_bw()

Exercise 1

Take a look at the dataset diamonds, which is included with ggplot2. Make a plot with carat on the x axis, price on the y axis, and then choose one other variable to map to either color or size. Is the variable numeric or categorical? Is it, like cyl, coded as numeric, but actually functions more like a categorical variable, and therefore should be transformed? Make your plot and show it off!

Types of graphs

There are lots of different kinds of graphs. In this tutorial, we’re going to focus on a few of the most common, while highlighting a few of the different tricks you can use to represent data in different ways. In these examples, we will mostly work with variables that aren’t going to require a lot of recoding or other carpentry on our part. But keep in mind, most of the time you make graphs, most of the work will be data carpentry to get the data in the shape you want it to make a graph, not making the graph itself (though sometimes that can be tricky too!).

Histograms

One of the simplest kind of graphs is a histogram. Histograms represent the distribution of a numeric variable. Here’s a histogram of height in the GSS, for all years. (Note: you may be wondering how I know what variables are included in the 6,000 GSS variables and 2,000 CSPP variables. The answer is that I went through the codebooks, which you have to do with basically any data set you use. The codebook for the GSS is here, and the codebook for the CSPP is here.)

df.gss %>%
  select(height) %>% 
  ggplot(aes(x=height)) + geom_histogram()
## Adding missing grouping variables: `year`
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 62182 rows containing non-finite values (stat_bin).

What’s up with that big spike just over 70 inches? Seems statistically improbable!

As you can see in the code, histograms only require one aesthetic (i.e. one variable). That variable is mapped on to the X axis, and the Y axis is then automatically the count, or frequency, or observations at that value of the X axis variable.

There are two primary ways you’d change histograms. First, you can alter how fine-grained the binning of the X axis is. This affects how many bars are in your plot. You alter this with the “bins” argument in the geom_histogram() function. The default is 30 bins, which in this case actually lines up with our observed range (about 45 to 85) pretty well.

df.gss %>%
  select(height) %>% 
  ggplot(aes(x=height)) + geom_histogram(bins=60)
## Adding missing grouping variables: `year`
## Warning: Removed 62182 rows containing non-finite values (stat_bin).

The other primary way to alter a histogram is by adding another aesthetic (i.e. another variable). Here, the obvious one to add is sex.

df.gss %>%
  select(height, sex) %>% 
  ggplot(aes(x=height, fill=sex)) + geom_histogram(position = "dodge")
## Adding missing grouping variables: `year`
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 62182 rows containing non-finite values (stat_bin).

Here, we need to add sex to our select() call, since we need it in the data we’re sending into ggplot(). Then, we add the “fill” aesthetic in our aes() function. Note that this is different from the “color” aesthetic we used above. With points and lines, you use color. With bar graphs and other polygon plots, you use “fill”. For polygon plots, “color” only affects the outline of the polygons, not their internal colors. We also need to add the ‘position = “dodge”’ argument to geom_histogram(), which tells R to put the bars side by side for each height, rather than on top of one another.

Exercise 1

Choose a numeric variable from the GSS (you can use the data explorer to search for variables, or just ask me for suggestions). Create a histogram of it, using a second variable to split the data (I’d suggest something simple and dichotomous or trichotomous, like race (“race”), or sex (“sex”), or divorce status (“divorce”), or ever unemployed in last ten years (“unemp”)).

Exercise 2

Use the filter() function to subset the GSS based on some variable, and then create a histogram of a variable from the subsetted data.

Scatter Plots

Scatter plots are good for dealing with data with lots of observations, like surveys or panel data. They work best when both x and y variables are continuous. For example, in the CSPP, we can look at the evangelical share of state population, and a state’s per capita corrections spending (we have expenditure spending, and population, and so can easily construct that measure).

df.cspp %>%
  as_tibble() %>%
  select(st,
         year,
         poptotal,
         exp_correction,
         evangelical_pop) %>%
  mutate(pcec = (exp_correction*1000)/poptotal) %>% #Creating the per capita corrections spending variable
  ggplot(aes(x=evangelical_pop, y=pcec)) + geom_point()
## Warning: Removed 6020 rows containing missing values (geom_point).

We could add information to this plot by mapping a variable to the color aesthetic. The CSPP uses a four region coding scheme, in which the south is 1, the west is 2, the midwest is 3, and the northeast is 4.

df.cspp %>%
  as_tibble() %>%
  select(st,
         year,
         poptotal,
         exp_correction,
         evangelical_pop,
         region) %>% #adding region to our chosen variables
  mutate(pcec = (exp_correction*1000)/poptotal) %>% 
  ggplot(aes(x=evangelical_pop, y=pcec, color=as.factor(region))) + #Once again, converting a numeric variable to categorical
  geom_point()
## Warning: Removed 6020 rows containing missing values (geom_point).

There are a number of other aesthetics you can use with geom_point(). Shape and size are the two most useful. Here’s the same plot, but using shape instead of color to represent region.

df.cspp %>%
  as_tibble() %>%
  select(st,
         year,
         poptotal,
         exp_correction,
         evangelical_pop,
         region) %>% #adding region to our chosen variables
  mutate(pcec = (exp_correction*1000)/poptotal) %>%
  ggplot(aes(x=evangelical_pop, y=pcec, shape=as.factor(region))) + #Once again, converting a numeric variable to categorical
  geom_point()
## Warning: Removed 6020 rows containing missing values (geom_point).

If you wanted, you could combine all three of them, to represent five different variables’ relationships on one plot. That would make for a very difficult to read plot. In practice, you probably only want to map 3 variables (maybe 4 if you have a very good reason).

For data like this, our x and y scales work pretty well. There are a lot fewer observations than there are ticks on the x axis. In some datasets, that won’t be the case. For example, the GSS has about 2,000 observations for each wave. For a variable like education, this ends up looking pretty ugly and uninformative if we use geom_point().

df.gss %>%
  filter(year==2018) %>%
  ggplot(aes(x=educ, y=realrinc)) + geom_point()
## Warning: Removed 986 rows containing missing values (geom_point).

For variables like this, geom_jitter() is a very helpful function. It spreads out the points a little, making the density of observations at each value of the x axis a lot clearer.

df.gss %>%
  filter(year==2018) %>%
  ggplot(aes(x=educ, y=realrinc)) + geom_jitter()
## Warning: Removed 986 rows containing missing values (geom_point).

For plots like this, it can be helpful to draw lines of best fit. The function geom_smooth() lets us do this.

df.gss %>%
  filter(year==2018) %>%
  ggplot(aes(x=educ, y=realrinc)) + geom_jitter() +
  geom_smooth(se=FALSE) # se=FALSE tells it not to draw a confidence interval
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
## Warning: Removed 986 rows containing non-finite values (stat_smooth).
## Warning: Removed 986 rows containing missing values (geom_point).

By default, geom_smooth() uses lowess smoothing, an algorithm that calculates locally weighted means of the y variable for every value of the x variable. But you can also have it draw a linear regression line, equivalent to the line \(y=a + bx\) if you fit a bivariate regression.

df.gss %>%
  filter(year==2018) %>%
  ggplot(aes(x=educ, y=realrinc)) + geom_jitter() +
  geom_smooth(method = "lm", #Draw a linear regression line
              se=FALSE) # se=FALSE tells it not to draw a confidence interval
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 986 rows containing non-finite values (stat_smooth).
## Warning: Removed 986 rows containing missing values (geom_point).

Sometimes, it’s useful to plot words, rather than dots. For example, if we are making a plot where our observations are the 50 states, we might want state abbreviations to be mapped to the variable values, rather than colored dots. This is accomplished with geom_text(). This function needs to the aesthetic “label” to be specified in your ggplot() call.

df.cspp %>%
  select(st, year, healthfree, infantmortality) %>%
  filter(year==2001) %>%
  ggplot(aes(x=healthfree, y=infantmortality, label=st)) + geom_text() +
  geom_smooth(method = "lm",
              se=FALSE) +
  theme_bw() +
  labs(title = "Health Insurance Freedom and Infant Mortality in the States, 2001",
       x = "Health Insurance Freedom Index",
       y = "Infant Mortality Rate (deathss per 1,000 live births",
       caption = "Data: Correlates of State Policy Project, Michigan State University")
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 1 rows containing non-finite values (stat_smooth).
## Warning: Removed 1 rows containing missing values (geom_text).

Exercise 1

Use CSPP for the year 2001 to make a scatter plot of state education spending per student (“edinstruct_expend_pstud”) and state high school drop out rate (“eddropoutrate”), with total enrolled students ("enrollstudents) mapped to the size aesthetic. Use theme_bw() and labs() to make it look nice!

Line plots

Line plots work for data that are grouped, such that there are a defined number of observations for each value on the x axis. The simplest way to do this is with a group of 1. For example, we could look at how average years of education has changed over time in the GSS.

df.gss %>%
  group_by(year) %>%
  summarise(avg_ed = mean(educ, na.rm=TRUE)) %>%
  ggplot(aes(x=year,y=avg_ed)) + geom_line()
## `summarise()` ungrouping output (override with `.groups` argument)

We can also break this down by gender by mapping it to color.

df.gss %>%
  group_by(year, sex) %>%
  summarise(avg_ed = mean(educ, na.rm=TRUE)) %>%
  ggplot(aes(x=year,y=avg_ed, color=sex)) + geom_line()
## `summarise()` regrouping output by 'year' (override with `.groups` argument)

As you can see, line graphs are often good for representing pivot tables.

One alternative to color as an aesthetic is linetype (useful when submitting plots for journals that only publish in greyscale).

df.gss %>%
  group_by(year, sex) %>%
  summarise(avg_ed = mean(educ, na.rm=TRUE)) %>%
  ggplot(aes(x=year,y=avg_ed, linetype=sex)) + geom_line()
## `summarise()` regrouping output by 'year' (override with `.groups` argument)

Finally, you can also add points to a line plot.

df.gss %>%
  group_by(year, sex) %>%
  summarise(avg_ed = mean(educ, na.rm=TRUE)) %>%
  ggplot(aes(x=year,y=avg_ed, linetype=sex)) + geom_line() + geom_point()
## `summarise()` regrouping output by 'year' (override with `.groups` argument)

Exercise 1

Use mutate() and case_when() to make a new, numeric version of variable “abany” (support abortion being legal for any reason). Then, use group_by(), summarise(), and mean() to make a pivot table of the average answer to this question for each year in the GSS. Plot it using geom_line(). Bonus: break it down by gender!

Bar Plots

Bar plots can be used in some of the same situations that line plots can. For example, here are the data from the previous plot, but with geom_bar().

df.gss %>%
  group_by(year, sex) %>%
  summarise(avg_ed = mean(educ, na.rm=TRUE)) %>%
  ggplot(aes(x=year,y=avg_ed, fill=sex)) + 
  geom_bar(stat="identity",
           position="dodge")
## `summarise()` regrouping output by 'year' (override with `.groups` argument)

In this case, a line plot is clearly the superior way to present the data. Notice a few changes: first, I had to change linetype to fill, since linetype isn’t an aesthetic available to bar plots. Second, geom_bar() needs two argument. First, stat=“identity” is necessary to tell R that I’m giving it a ready made table, and it doesn’t have to do any counting. Second, position=“dodge” tells R to put the two values of sex side by side, instead of stacking them.

Without stat=“identity”, geom_bar()’s default behavior is to count observations, and plot the counts. For example, here is a bar plot of “relig” for the year 2010.

df.gss %>%
  filter(year==2010) %>%
  ggplot(aes(x=relig)) + geom_bar()

The data here are identical to using table() on “relig” for the year 2010. Here, we don’t need position=“dodge”, because we don’t have any fill aesthetic to decide whether to stack or place side by side.

Like line plots, bar plots are good for pivot tables. Here are average educational expenditures per student by region for the year 2008.

df.cspp %>%
  filter(year==2008) %>%
  group_by(region) %>%
  summarise(avg_spending = mean(edinstruct_expend_pstud, na.rm=TRUE)) %>%
  ggplot(aes(x=region, y= avg_spending)) + geom_bar(stat="identity")
## `summarise()` ungrouping output (override with `.groups` argument)

cheatsheet: https://rstudio.com/wp-content/uploads/2015/03/ggplot2-cheatsheet.pdf

Faceting

Faceting draws a series of graphs, based on the values of an additional variable. In that sense, you can think of faceting like an additional aesthetic, except unlike other aesthetics, it’s not specified in the ggplot() call. There are two ways to facet: facet_wrap() and facet_grid(). We’ll focus on facet_wrap() here.

facet_wrap() takes a variable as an argument, and depending whether you put a tilde (“~”) before or after it, makes graphs that proceed in either rows or columns through the values of the variable you’ve put down. For example, here is the education/income scatterplot we looked at above, made for every year in the GSS.

df.gss %>%
  ggplot(aes(x=educ, y=realrinc)) + geom_jitter() +
  geom_smooth(method = "lm",
              se=FALSE) + # se=FALSE tells it not to draw a confidence interval
  facet_wrap(~year)
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 26969 rows containing non-finite values (stat_smooth).
## Warning: Removed 26969 rows containing missing values (geom_point).

In this case, the number of graphs makes faceting not terribly useful for a image of this size. But the basic principle is always the same. You could, for example, make a graph of the relationship between income vote choice, and draw a different graph for voters by race.

Maps

One cool use for ggplot2 is maps. Since ggplot2 can make polygons, it can draw maps. Making maps can be a little tricky, because it always involves merging data, and merging data almost always involves some data cleaning. But once you get the hang of it, it can be a really nice way to represent data.

In this tutorial, I’m going to focus on US maps, though you can also download packages that let you draw world maps for international data. My preferred map package is “usmap”. Install it and load it now.

To draw a map, we need a data frame that tells R where to draw lines. We get that dataframe with the following code.

us_map(regions="states") %>% as_tibble()
## # A tibble: 12,999 x 9
##           x         y order hole  piece group fips  abbr  full   
##       <dbl>     <dbl> <int> <lgl> <int> <chr> <chr> <chr> <chr>  
##  1 1091779. -1380695.     1 FALSE     1 01.1  01    AL    Alabama
##  2 1091268. -1376372.     2 FALSE     1 01.1  01    AL    Alabama
##  3 1091140. -1362998.     3 FALSE     1 01.1  01    AL    Alabama
##  4 1090940. -1343517.     4 FALSE     1 01.1  01    AL    Alabama
##  5 1090913. -1341006.     5 FALSE     1 01.1  01    AL    Alabama
##  6 1090796. -1334480.     6 FALSE     1 01.1  01    AL    Alabama
##  7 1090235. -1304438.     7 FALSE     1 01.1  01    AL    Alabama
##  8 1089944. -1289528.     8 FALSE     1 01.1  01    AL    Alabama
##  9 1089451. -1265304.     9 FALSE     1 01.1  01    AL    Alabama
## 10 1089305. -1258363.    10 FALSE     1 01.1  01    AL    Alabama
## # … with 12,989 more rows

As you can see, this data frame has, for each state, a series of x and y values and a number of other variables. To use it, we should save it as data frame.

us_map(regions="states") %>% as_tibble() -> df.map

Now we need some data to map. Let’s use the CSPP unemployment data we’ve been using. To do this, we need to merge the data, and to do that, we need a variable with the same name (easy to fix) that’s coded the exact same way (can be a major pain) in both datasets.

There’s a shortcut to checking this. It involves the unique() function, which takes a vector or dataframe as an argument, and outputs the unique elements (or, in a data frame, rows) in it. This lets us check whether the coding in the two variables is the same. In this case, let’s look at the abbreviated state names, since those will be easier to change if they need to be.

unique(df.cspp$st) == unique(df.map$abbr)
##  [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [16] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [31] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [46] TRUE TRUE TRUE TRUE TRUE TRUE

In this case, the two are identical, so we only need to change the name of one of them to make them mergeable. Let’s take only the variables we need from our measured data to reduce clutter. I’ll also filter by year to give us only one year of data, though the procedure would be identical if I were merging with all years.

df.cspp %>%
  filter(year==2010) %>%
  select(abbr = st, #changing the name of this variable to match the one in df.map
         unemployment) -> df.donor

Now it’s simply a matter of merging the data sets, and sending the resulting data set into ggplot().

inner_join(df.map, df.donor, by="abbr") %>%
  ggplot(aes(x=x,y=y,fill=unemployment, group=group)) +  #Note you need group=group here.
  geom_polygon(color="black") + #color="black" draws state lines in black 
  coord_fixed() #This tells R to preserve the aspect ratio, no matter the size of the image

In the above example, we used a continuous variable (unemployment) for the fill aesthetic. We can also use categorical variables. In the next example, we’ll use a categorical variable, “fsspr”, which rates the strength of state’s development planning role as weak, significant, or substantial. Since the data in the dataframe are stored as 1, 2, 3, I have to recode it with the labels using the function factor().

df.cspp %>%
  filter(year==2009) %>% #Data stop in 2009 for this variable
  select(abbr = st, #changing the name of this variable to match the one in df.map
         fsspr) %>% #selecting fsspr rather than unemployment
 mutate(fsspr2 = case_when(fsspr==1 ~ "weak",
                           fsspr==2 ~ "significant",
                           fsspr==3 ~ "substantial")) -> df.donor

inner_join(df.map, df.donor, by="abbr") %>%
  ggplot(aes(x=x,y=y,fill=fsspr2, group=group)) +  #Note you need group=group here.
  geom_polygon(color="black") + #color="black" draws state lines in black 
  coord_fixed() #This tells R to preserve the aspect ratio, no matter the size of the image

Animations

An extension of ggplot2, the library gganimate, has the ability to create animated graphics. While obviously these animations can’t be used in papers, they can be quite nice for presentations and other forms of public scholarship.

Fundamentally, animations do the same thing that faceting does. It takes a variable, and splits your plot into multiple plots based on values of that variable. Instead of using facet_wrap() or facet_grid(), however, you use one of gganimate’s transition functions. To see what I mean, let’s look at diamonds first faceted by color, and then animated.

diamonds %>% ggplot(aes(x=carat,y=price, color=clarity)) + geom_point() + facet_wrap(~color)

diamonds %>% ggplot(aes(x=carat,y=price, color=clarity)) + geom_point() + transition_states(color)

As you can see, the only difference in the code is substituting transition_states() for facet_wrap(). As you saw when running this code, animation takes time. Because of this, it’s often helpful to test your code using facet_wrap(), and only move it to an animation when you’re sure all the other parts of it are working correctly.

Let’s do one with GSS data. First, we’ll make a pivot table that plots average income by year, highest education degree, and sex.

df.gss %>%
  group_by(year, degree, sex) %>%
  summarise(avg_income = mean(realrinc, na.rm=TRUE))%>%
  ungroup() %>%
  filter(complete.cases(.)) -> df.edsex
## `summarise()` regrouping output by 'year', 'degree' (override with `.groups` argument)

Then, we can make an animation that shows the relationship by year.

df.edsex %>%
  ggplot(aes(x=degree,y=avg_income, color=sex, group=sex)) + geom_line() + #we need group here because R doesn't know whether to group by year or sex at this point.
  transition_states(year)

This doesn’t look great. The animation moves too quickly to really be interpretable. By default, gganimate() makes a gif of 100 frames that plays at 10 frames per second. Since we have 30 years of data, that’s about 1/3 of a second per year. We can change that with the function animate(), which can take the arguments nframes, fps, and duration (obviously inputing two of those arguments automatically determines the third). To use animate, we need to save our animation, and then add animate to it.

df.edsex %>%
  ggplot(aes(x=degree,y=avg_income, color=sex, group=sex)) + geom_line() + #we need group here because R doesn't know whether to group by year or sex at this point.
  transition_states(year) -> p1

animate(p1, 
        nframes=300,
        fps=10)

Looking better! Now let’s add some decent labeling. One really nice thing that transition_states() does is give us the object {closest_state}, which we can insert in any string, and R will put in the value of the variable we have mapped to transition_state() (in our case, year) for that point in the animation.

df.edsex %>%
  ggplot(aes(x=degree,y=avg_income, color=sex, group=sex)) + geom_line() + #we need group here because R doesn't know whether to group by year or sex at this point.
  transition_states(year) +
  theme_bw() +
  labs(title = "Education, Income, and Gender, 1974-2018",
       subtitle = "Year: {closest_state}",
       x="Educational Degree",
       y="Average Income",
       caption = "Data: General Social Survey")-> p1

animate(p1, 
        nframes=300,
        fps=10)

Exercise 1

Make an animation of a bar plot that shows the changing distribution of the variable “partyid” by year (don’t worry about futzing with fps and the like. Just try and get an animation made!).

Exercise 2: Final Project

Take the variables “st”, “year”, and “right2work” from the CSPP, and merge them with your df.map dataframe. Then, make an animation that shows the passage of right to work laws by year.